15 Eylül 2024 Pazar

822


x
xi
TABLE OF CONTENTS
Page
FOREWORD ............................................................................................................. ix
TABLE OF CONTENTS .......................................................................................... xi
ABBREVIATIONS ................................................................................................. xiii
SYMBOLS ................................................................................................................ xv
LIST OF TABLES ................................................................................................. xvii
LIST OF FIGURES ................................................................................................ xix
SUMMARY ............................................................................................................. xxi
ÖZET .............................................................................................................. xxv INTRODUCTION .................................................................................................. 1
Literature Review
............................................................................................... 3
1.1.1 Super-resolution techniques ........................................................................ 4
1.1.2 Datasets in super-resolution ....................................................................... 8 Purpose and Hypotheses of the Thesis .............................................................
12
HISTORICAL AERIAL PHOTOHRAPH .......................................................
15
SUPER-RESOLUTION PRACTICES ..............................................................
21
Approach to Dataset Content ...........................................................................
21
Approach to Dataset Structure .........................................................................
29
Super-Resolution Implementation ....................................................................
33
IMAGE QUALITY ASSESSMENT ................................................................
... 37
Image Quality Metrics ......................................................................................
37
Visual Interpretation .........................................................................................
39
RESULTS AND DISCUSSION ..........................................................................
41
CONCLUSIONS AND RECOMMENDATIONS .............................................
55
REFERENCES ......................................................................................................... 59
CURRICULUM VITAE .......................................................................................... 67
xii
xiii
ABBREVIATIONS
BRISQUE : Blind/Referenceless Image Spatial Quality Evaluator
DL : Deep Learning
EDSR : Enhanced Deep Super-Resolution
GAN : Generative Adversarial Network
HAP : Historical Aerial Photograph
HR : High Resolution
LR : Low Resolution
MFSR : Multi-Frame Super-Resolution
MSE : Mean Squared Error
PIE : Photo Interpretation Elements
PSNR : Peak Signal to Noise Ratio
RCAN : Residual Channel Attention Network
RDN : Residual Dense Network
RMSE : Root Mean Squared Error
RS : Remotely Sensed
SAN : Second Order Attention Network
SISR : Single Image Super-Resolution
SR : Super-Resolution
SRCNN : Super-Resolution Convolutional Neural Network
UAV : Unmanned Aerial Vehicle
UIQI : Universel Image Quality Index
VDSR : Very Deep Super-Resolution
xiv
xv
SYMBOLS
l : Luminance
c : Contrast
s : Structure
α, β, γ : Positive constants
μ : Local mean
σ : Standard deviation
xvi
xvii
LIST OF TABLES
Page Total number of parameters for the basic SR models.
.............................. 8 Common and freely available datasets. ....................................................
9
Common and freely available datasets consisting of RS images ............
10
Table 3.1 : Limiting effects and minimizing solutions. .................................... ……24
Table 5.1 : Metrics results obtained for different classes from dataset-1 and dataset-2 in the approach to dataset content. .......................................................... 46
Table 5.2 : Metric results obtained for different mixed areas from dataset-1 and dataset-2 in the approach to dataset content. .......................................... 46
Table 5.3 : Metric results obtained for different mixed areas from the images concatenated according to average of the images (approach to dataset structure). ................................................................................................ 51
Table 5.4 : Metric results obtained for different mixed areas from the images concatenated according to BRISQUE values of the images (approach to dataset structure). .................................................................................... 51
xviii
xix
LIST OF FIGURES
Page Different examples of radiometric resolution levels: 4-bit image (left) and 8-bit image(right).
.............................................................................. 2 Relationship between resolution and pixel density ................................
. 2
Figure 1.3 : Examples of training images from the datasets in the literature. ............ 9
Figure 1.4 : Examples of test images from the most used datasets in the literature. 10
Figure 1.5 : Examples of sample images from the WHU-RS19 dataset. .................. 11
Figure 2.1 : Historical development of photogrammetry. ......................................... 15
Figure 2.2 : An example of HAP from Vermont Data Portal . ................................. 16
Figure 2.3 : Relationship between spatial resolution and GSD. ............................... 18
Figure 2.4 : Details of different clarity in the same photo. ....................................... 18
Figure 2.5 : Grayscale historical orthophoto images (a) Year: 1954 Resolution: 30 cm; (b) Year: 1968 Resolution: 40 cm; (c) Year: 1982 Resolution: 10 cm; (d) Year: 1993 Resolution: 40 cm … .............................................. 19
Figure 3.1 : Applied methodology for the approach to dataset content. ................... 22
Figure 3.2 : Examples from training dataset. ............................................................ 23
Figure 3.3 : Multi-spectral image and separate band. ............................................... 24
Figure 3.4 : Imitating multi-spectral data.................................................................. 26
Figure 3.5 : Importance of color information for the diversity in an image. ............ 27
Figure 3.6 : Intensity distributions of dataset-1 and dataset-2. ................................. 28
Figure 3.7 : Training image in dataset-1 (1600x1600) and dataset-2 (512x512). .... 29
Figure 3.8 : Photo interpretation elements and their heirarchy. ................................ 30
Figure 3.9 : Photo interpretation elements and correspoding images. ...................... 32
Figure 3.10 : Methodology depending on the average concatenation of the individually enhanced images................................................................. 33
Figure 3.11 : Methodology depending on the BRISQUE values of the images. ...... 33
Figure 3.12 : SRCNN network architecture. ............................................................. 34
Figure 5.1 : Visual comparison between LR, HR, and improved images for farmland class......................................................................................................... 42
Figure 5.2 : Visual comparison between LR, HR, and improved images for residential class. ...................................................................................... 43
Figure 5.3 : Visual comparison between LR, HR, and improved images for forest class......................................................................................................... 44
Figure 5.4 : Visual comparison between LR, HR, and improved images for bareland class......................................................................................................... 45
Figure 5.5 : Visual comparison between enhanced images for mixed area (approach to dataset content). .................................................................................. 47
Figure 5.6 : Visual comparison between enhanced images for mixed area (approach to dataset content). .................................................................................. 48
Figure 5.7 : Visual comparison between enhanced images for mixed area (approach to dataset structure). ................................................................................ 49
Figure 5.8 : Visual comparison between enhanced images for mixed area (approach to dataset structure). ................................................................................ 50
xx
xxi
ASSESSING THE IMPACT OF SUPER-RESOLUTION ON ENHANCING THE SPATIAL QUALITY OF HISTORICAL AERIAL PHOTOGRAPHS
SUMMARY
The level of distinguishability of details in an image is called resolution. In current studies, high-resolution (HR) images are generally preferred. However, not all available images may have resolution sufficient to fulfill their intended purpose. Due to hardware and cost constraints, it's not always feasible to obtain and prodecure HR images, hence low-resolution (LR) images need to be enhanced. This process is possible through techniques known as super-resolution (SR).
SR is defined as obtaining an HR image from an LR one. It's accepted that an LR image is a degraded version of its HR counterpart. When detrimental effects are applied to an HR image, some information will be lost. Consequently, a lower-quality image will be obtained, which is referred to as LR. However, the image in need of enhancement is LR, while the unavailable image is HR. Therefore, transitioning from LR to HR is an inverse problem. To solve this problem, the lost information must be identified and restored to the LR image.
In current SR studies, deep learning (DL) based models are now being utilized. Various network designs are employed to enhance model performance and achieve better image quality. These designs primarily include linear learning, residual learning, recursive learning, multi-scale learning, dense connections, generative adversarial networks, and attention mechanisms.
DL-based SR studies initially began with the use of linear learning in the Super-Resolution Convolutional Neural Network (SRCNN) model. After linear learning, models utilizing residual learning with deeper networks and higher performance perspectives gained prominence. Due to the practical challenges posed by the increased number of parameters in deeper networks, recursive learning has been introduced in image processing studies. Recursive learning, based on the principle of parameter sharing to control the total number of parameters, allowed models to run much faster but introduced the vanishing gradient problem. In this context, dense- connected models incorporating both residual learning and recursive learning were proposed. Subsequently, visually high-quality images were obtained using generative adversarial network structures. Nowadays, there is a focus on attention mechanisms in SR studies. In summary, to improve model performance, learning strategies were altered, various loss functions were tested, and network architectures were modified with various hyperparameters. However, all efforts have been solely algorithm-based, and satisfactory results have actually been achieved, especially with attention mechanisms.
One aspect that has not yet been fully addressed in SR studies is the impracticality of using deeper and more complex structures in real-time applications and the inability of models built on common datasets to deliver the expected performance in enhancing images for solving real-engineering problems. For the former, the performance rates
xxii
of lightweight network architectures should be increased. For the latter, specific approaches tailored to solving the problem should be introduced.
The remotely sensed (RS) images that have been scarcely evaluated in SR studies are historical aerial photographs (HAP). Besides the negative effects harbored during the enhancement of RS images, HAPs have additional constraints. Information losses during the conversion of printed copies to digital copies, data acquisition hardware used depending on the technological possibilities of the era, lack of spectral bands, and color information are the main negative constraints. Since HAPs play a crucial role in solving problems the present which is related to the past, they also need to be improved with SR techniques.
In this thesis study, it is aimed to enhance the spatial quality of grayscale HAPs with DL-based SR model. In this context, approaches have been brought regarding the content and structure of the dataset. Orthophotos obtained from the General Directorate of Mapping of different years with different resolutions have been used as the primary data source. The acquired orthophotos belong to the years 1954 with a resolution of 30 cm, 1968 with resolutions of 40 cm and 70 cm, and 1982 with a resolution of 10 cm, and 1993 with a resolution of 40 cm.
In the approach to dataset content, images of residential areas, farmland areas, forested areas, and bare land classes were extracted separately from orthophotos to create datasets. DL-based SR models cannot be directly used on HAPs because they are built on multi-spectral images. To overcome this limitation, artificial 3-band images were created by duplicating the same band twice. Although the single-band image is numerically converted to a three-band image, there is no change in content. To minimize this limitation, images of different resolutions from different years covering the same regions were used. This approach, which can be called imitating the multi-spectral image, did not include images containing only three different spectral bands in the training, but it seemed as if different spectral bands of the same image were included separately in the training.
Another limitation is the lack of color information, which is due to the grayscale nature of the HAPs. The lack of color information for grayscale HAPs was minimized by using images with a wide range of intensities. Since different intensity values provide different grayscale tones, maximum use has been made of intensity values that provide differences for objects that are similar to each other both within the same category and across different categories.
Another limitation for HAPs is that LR-HR image pairs are insufficient in content, which has been overcome by using larger size images. Depending on the years from which the data were obtained, there are a limited number of classes. During the convolution process, filters have been ensured to gather information on images containing more diversity in larger image sizes.
The proposed approach for the dataset structure is based on the hierarchy of photo interpretation elements. The hierarchy of photo interpretation elements is expressed with different levels. The first level involves color and tone information, which are more pronounced in bare land and forest areas found in orthophotos. The second level includes size, shape, and texture. Residential areas represent the group that reflects these elements the most. The third level includes patterns, with farmland areas being the group that best reflects this element. Within this framework, the dataset is structured as the 1st level consisting of bare land and forest areas, the 2nd level consisting of residential areas, and the 3rd level consisting of farmland areas. The 1993
xxiii
image was also used in the approach to the data set structure. Each of the three datasets were trained separately by means of SRCNN model. Two different methodologies were used to obtain the final image from separately trained data sets. The final image was created with the average of 3 different images improved in the first methodology. In the second methodology, each improved image was divided into pieces of equal size. A reference-free image quality metric was calculated for each part obtained. The final image was created by concatenating identical parts for which the quality metric gave better results.
Approaches to both dataset content and dataset structure were evaluated with reference-based image quality metrics as well as visual interpretation. In the content-based approach, pixel-based metrics and structural similarity based metrics demonstrated positive progress. Evaluations made through visual interpretation also yielded consistent results with image quality metrics. This approach was also effective in reducing the softening effect on the output image. In the structural-based approach, creating the final image based on the reference-free image quality metric gave better results. However, the selectability of better image parts requires more advanced image processing techniques.
xxiv
xxv
TARİHİ HAVA FOTOĞRAFLARININ MEKANSAL KALİTESİNİ ARTIRMADA SÜPER-ÇÖZÜNÜRLÜĞÜN ETKİSİNİN İRDELENMESİ
ÖZET
Görüntü verisi günümüz yaşantısının önemli bilgi kaynaklarından biridir. Kullanım amacı ve alanından ve elde edinme donanımından bağımsız olarak kullanıcı bakış açısı ile bir görüntü için en önemli özellik ise kalitesidir. Kalite terimi hem nicelik hem de nitelik olarak değerlendirilebilmekle beraber direkt olarak çözünürlük ile ilişkilidir. Çözünürlük bir görüntüdeki detayların ayırt edilebilirlik seviyesini ifade etmektedir. Günümüz çalışmalarında da yüksek çözünürlüklü (YÇ) görüntüler daha çok tercih edilmektedir. Bununla birlikte kullanılabilir tüm görüntüler kullanım amacını yerine getirecek kadar yeterli çözünürlüğe sahip olamamaktadır. Donanımsal ve maliyet kısıtlamaları nedeniyle YÇ görüntüyü hem elde etmek hem de temin etmek her zaman mümkün olmadığı için daha düşük çözünürlüklü (DÇ) olan görüntülerin kullanım amacını yerine getirecek seviyede iyileştirilmesi gerekmektedir. Bu işlem ise süper-çözünürlük (SÇ) adı verilen tekniklerle mümkündür.
DÇ bir görüntüden daha YÇ bir görüntünün elde edilmesi SÇ olarak adlandırılmaktadır. Bu noktada YÇ ifadesi ile belirtilmek istenen aslında YÇ’den daha iyi olan anlamındadır. Bir DÇ görüntünün aslında YÇ karşılığının bozulmuş versiyonu olduğu kabul edilmektedir. Bir başka ifade ile YÇ bir görüntüyü bozucu etkiler uygulandığında bazı bilgiler kayıp olacaktır. Bunun sonucunda daha düşük kalitede bir görüntü elde edilecektir ki bu da DÇ olarak adlandırılmaktadır. Bununla birlikte iyileştirilme ihtiyacı olan görüntü DÇ olandır ve mevcut olmayan görüntü ise YÇ olandır. Dolayısı ile DÇ’den YÇ’ye olan geçiş bir ters problemdir. Bu problemin çözümü için kayıp olduğu düşünülen bilgiler tespit edilmesi ve DÇ görüntüye tekrar geri yüklenmelidir.
SÇ çalışmalarında günümüzde artık derin öğrenme (DÖ) destekli modeller kullanılamaktadır. Model performanslarını artırarak daha iyi YÇ görüntü elde etmek amacıyla çeşitli ağ dizaynları kullanılmaktadır. Bunlar temel olarak lineer öğrenme, artık öğrenme, özyinelemeli öğrenme, çok ölçekli öğrenme, yoğun bağlantılar, çekişmeli üretici ağlar ve dikkat yapılarıdır. DÖ tabanlı SÇ çalışmaları ilk olarak lineer öğrenmenin kullanıldığı Super-Resolution Convolutional Neural Network (SRCNN) modeli ile başlamıştır. Sonrasında benzer öğrenme stratejisini farklı ağ dizaynları ile uygulayan modeller ile başarım oranı artırılmaya çalışılmıştır. Lineer öğrenmeden sonra daha derin ağ daha yüksek performans bakış açısı ile artık öğrenmenin kullanıldığı modeller ağırlık kazanmıştır. İlk başlarda girdi ile çıktı arasında direkt bağlantı kurulan global artık öğrenme tercih edilmiş sonrasında ise ağ mimarileri derinleştikçe çok sayıda bağlantının kullanıldığı lokal artık ağ kullanılmıştır. Derinleşen ağlarda artan parametre sayısının pratik uygulamalarda kullanım zorluğu sebebiyle özyinelemli öğrenme SÇ çalışmalarında kullanılmaya başlanmıştır. Ağırlık özelliklerinin paylaşımı ile toplam parametre sayısının kontrol edilmesi prensibine dayanan özyinelemeli öğrenmede modeller çok daha hızlı çalıştırılmış fakat yok olan gradyan problemini de beraberinde getirmiştir. Bu kapsamda gerek artık öğrenmenin
xxvi
gerekse özyinelemeli öğrenmenin bir arada kullanıldığı yoğun bağlantılı modeller ileri sürülmüşlerdir. Sonrasında çekişmeli üretici ağ yapıları ile görsel olarak kaliteli görüntüler elde edilmiştir. Günümüzde ise SÇ çalışmalarında dikkat yapıları üzerine yoğunlaşma vardır. Özetle model performanslarını artırmak amacıyla öğrenme stratejileri değiştirilmiş, çeşitli kayıp fonksiyonları test edilmiş, ağ mimarileri çeşitli hiperparametreler ile modifiye edilmiştir. Bununla birlikte tüm efor salt algoritma bazlı gösterilmiş ve özellikle dikkat yapıları ile birlikte aslında tatmin edici sonuçlara ulaşılmıştır. SÇ çalışmalarında henüz tam anlamıyla çözüme kavuşturulmamış husus ise daha derin ağların oluşturduğu daha kompleks yapıların gerçek zamanlı uygulamalarda kullanımının pratik olmaması ve genellikle yaygın veri setleri üzerinde inşa edilen modellerin gerçek bir mühendislik probleminin çözümüne yönelik görüntülerin iyileştirilmesinde beklenen performansı gösterememesidir. İlki için daha hafif ağ mimarilerinin başarım oranları artırılmalıdır. İkincisi için ise problemin çözümüne yönelik olarak spesifik yaklaşımlar getirilmelidir.
DÖ destekli SÇ çalışmalarında genellikle dijital görüntü işlemedeki farklı alanlarda da sıklıkla tercih edilen yaygın veri setleri kullanılmaktadır. Bunlar arasınd DIV2K eğitim amacıyla Set5 ve Set14 ise test amacıyla en çok tercih edilen veri setleridir. Bu veri setleri ise içerikleri dolayısıyla gerçek bir dünya/mühendislik probleminin çözümünden uzaktır. Dolayısıyla bu veri setleri üzerinde inşa edilen SÇ modelleri farklı içerikteki veri setlerinde her zaman başarılı olamamaktadırlar. Günümüzde SÇ çalışmalarının yoğunlaştığı veri gruplarından biri uzaktan algılanmış (UA) görüntülerdir. Elde edilme koşullarındaki farklılıklar, kapsadıkları alan ve içerdikleri detaylar sebebiyle UA görüntülerinin SÇ teknikleri ile iyileştirilmesi daha kompleks olmaktadır. Literatürdeki mevcut modeller UA görüntüleri özelinde modifiye edilerek YÇ görüntüler elde edilmeye çalışılmıştır. Farklı öğrenme stratejileri ve ağ dizaynları ile UA görüntüleri özelinde çok sayıda DÖ-destekli SÇ modeli ileri sürülmüştür. Özellikle Landsat ve Sentinel uydu görüntülerini iyileştirmek amacıyla yürütülen çalışmalar arazi kullanım ve örtüsü değişiminin zamansal tespitine yönelik uygulamaları daha efektif duruma getirmiştir.
UA görüntüleri kapsamında SÇ çalışmalarında hemen hemen hiç değerlendirilmeye alınmamış görüntüler ise tarihi hava fotoğraflarıdır (THF). THF’ler UA görüntülerinin iyileştirilmesi sırasında barındırdığı olumsuz etkilerin haricinde ekstra kısıtlamalara sahiptirler. Bu görüntülerin basılı kopyadan dijital kopyaya dönüştürülmesi sırasındaki bilgi kayıpları, dönemin teknolojik imkanlarına bağlı olarak kullanılan veri edinme donanımları, bant sayısı ve renk bilgisinin eksikliği başlıca olumsuz kısıtlamalardır. Geçmişin günümüzde ilişkili problemlerinin çözümünde kilit bir role sahip oldukları için THF’lerin de SÇ teknikleri ile iyileştirilmesi gerekmektedir.
DÖ destekli bir SÇ modelinin performansını etkileyen başlıca faktörler vardır. Bunlar veri setinin yapısı, kayıp fonksiyonları, görüntü değerlendirme kriterleri ve ağ dizaynıdır. Ağ dizaynı açısından kullanım amacını yerine getirecek kadar başarım oranı daha az parametre ile tasarlanan hafif ağlar ile sağlanmak zorundadır. Değerlendirme kriteri açısından her ne kadar piksel tabanlı metrik ölçütlerin görsel yorumlama ile uyuşumsuz olduğu çalışmalar olsa da yapısal benzerliği dikkate alan metrikler görsel yorumlama ile tutarlı sonuçlar vermektedir. Kayıp fonksiyonları tek başına bir kriter olmamakla birlikte kullanılan veri setinin içeriği ve tercih edilen ağ dizaynına bağlı olarak model performansını hem olumlu hem de olumsuz yönde etkileyebilmektedirler. Bu noktadaa ön plana çıkan faktör eğitim sırasında kullanılacak veri seti olmaktadır. DÖ-destekli bir SÇ modeli bilgiyi adım adım veriden
xxvii
toplamaktadır. Dolayısı ile veri setinin yapısal olarak kalitesi ve içerik olarak zenginliği iyileştirilen görüntünün kalitesini direkt olarak etkileyecek faktördür.
Bu tez çalışmasında gri tonlamalı THF’lerin hafif DÖ-destekli SÇ modelleri ile daha performanslı iyileştirilmeleri amaçlanmıştır. Bu kapsamda veri seti içeriğine ve yapısına yönelik olarak yaklaşımlar getirilmiştir. Harita Genel Müdürlüğü’nden temin edilen farklı yıllara ait farklı çözünürlükteki ortofotolar temel veri kaynağı olarak kullanılmışlardır. Temin edilen ortofotolar 30 cm çözünürlüklü 1954 yılı, 40 cm çözünürlüklü 1968 ve 1993 yılı ve 10 cm çözünürlüklü 1982 yılına aittirler.
Veri seti içeriğine yönelik yaklaşımda ortofotolardan yerleşim alanları, tarım alanları, ormanlık alanlar ve yalın arazi sınıflarına ait görüntüler ayrı ayrı çıkarılarak veri setleri oluşturulmuştur. Bu kapsamda THF’lerin barındırdığı kısıtlayıcı etkiler minimize edilmiştir. Bu kısıtlayıcı etkiler temel olarak spektral bant sayısı, görüntülerin gri-tonlamalı olmaları ve görüntülerdeki arazi kullanım ve örtüsü objelerinin sınırlı olmasıdır.
DÖ-destekli SÇ modelleri çok bantlı görüntüler üzerinde inşa edildikleri için THF’ler üzerinde direkt olarak kullanımları mümkün olmamaktadır. Bu kısıtlamanın üstesinden gelmek amacıyla aynı bant iki kez kopyalanarak yapay 3-bantlı görüntüler oluşturulmuştur. Mevcut SÇ modelleri farklı spektral bantların sunduğu farklı özelliklerden faydalanmaktadırlar. Tek bantlı görüntü her ne kadar sayısal olarak üç bantlı hale getirilse de içerik olarak bir değişiklik olmamaktadır. Bu kısıtlamayı minimize etmek için aynı bölgeyi içeren farklı yıllara ait farklı çözünürlükteki görüntüler kullanılmıştır. Çok-bantlı görüntünün taklit edilmesi olarak adlandırılabilecek bu yaklaşım ile tek başına 3-farklı spektral bant içeren görüntüler eğitime dahil olmamış fakat aynı görüntünün farklı spektral bantları eğitime ayrı ayrı dahil edilmiş gibi olmuştur.
Bir diğer kısıtlama olan renk bilgisinin eksikliği ise görüntülerin gri-tonlamalı olmasından kaynaklanmaktadır. Renk bilgisi gerek hem aynı kategorideki objelerin hem de farklı kategorilerde olmalarına rağmen benzerlik gösteren farklı objelerin birbirlerinden ayırt edilmelerinde önemlidir. Gri-tonlamalı THF’için mevcut olmayan renk bilgisinin eksikliği geniş yoğunluk aralığında görüntüler kullanılarak minimize edilmiştir. Farklı yoğunluk değerleri farklı gri-tonlamalar sunduğu için gerek aynı kategoride gerekse farklı kategoriler arasında birbirine benzer olan objeler için farklılığı sağlayan yoğunluk değerlerinden maksimum seviyede yararlanılmıştır.
THF’ler için kısıtlamalardan bir diğeri olan DÇ-YÇ görüntü çiftlerinin içerik olarak yetersiz olması ise daha büyük boyutlarda görüntüler kullanılarak aşılmıştır. Verilerin elde edildiği yıllara bağlı olarak sınırlı sayıda sınıf mevcuttur. Konvolüsyon işlemi sırasında filtrelerin daha yüksek görüntü boyutlarında daha fazla çeşitlilik içeren görüntüler üzerinde bilgi toplaması sağlanmıştır.
Veri setinin yapısına yönelik önerilen yaklaşım ise görüntü yorumlama elemanlarının hiyerarşisine dayanmaktadır. Bir görüntüde yer alan detaylar aslında görüntü yorumlama elemanlarıdır. Dolayısı ile SÇ tekniği ile görüntü yorumlama elemanları iyileştirildiği taktirde YÇ görüntü elde edilmiş olacaktır. Görüntü yorumlama elemanlarının hiyerarşisi ise farklı seviyeler ile ifade edilmektedir. İlk seviye renk ve ton bilgisine içermektedir ki bu elemanlar ortofolarda yer alan yalın arazi ve ormanlık alanlarda daha belirgindir. İkinci seviyede boyut, şekil ve doku mevcuttur. Bu elemanlara ait bilgileri en fazla yansıtan grup ise yerleşim alanlarıdır. Özellikle binalar şekil ve boyut anlamında en belirgin özellikleri taşımaktadırlar. Üçüncü seviyede örüntü yer almaktadır ve bu elemanı en iyi yansıtan grup tarım alanlarıdır. Bu
xxviii
kapsamda veri seti yalın arazi ve ormanlık alandan oluşan 1. seviye, yerleşim alanlarından oluşan 2. Seviye ve tarım alanlarından oluşan 3. Seviye olarak oluşturmuştur. Her bir veri seti SRCNN ile ayrı ayrı eğitilmiştir. Eğitilen her bir model ile test görüntüleri ayrı ayrı iyileştirilmiştir. İyileştirilen görüntülerden sonuç görüntüsünün elde edilmesi amacıyla iki farklı metodoloji uygulanmıştır. İlkinde iyileştirilmiş olan görüntülerin ortalaması alınarak direkt olarak sonuç görüntüsü oluşturulmuştur. Diğer metodolojide ise iyileştirilen her bir görüntü görüntülerin sol üst köşesinden başlayarak eş parçalara ayrılmıştır. Birbirlerine karşılık gelen eş parçalar arasında mekansal olarak daha kaliteli olanların bir araya getirilerek daha iyi bir sonuç görüntüsüne ulaşmak amaçlanmıştır. Bu kapsada her bir eş parça için referanssız görüntü kalite metriği hesaplanmıştır. Daha iyi sonuç veren parçalar bir araya getirilerek sonuç görüntü oluşturulmuştur.
Her iki yaklaşımda iyileştirilen görüntülerin değerlendirilmesi için görsel yorumlamanın yanısıra referans tabanlı görüntü kalite metrikleri kullanılmıştır. Veri seti içeriğine yönelik yaklaşımda gerek piksel tabanlı gerekse yapısal benzerliği dikkate alan metrikler pozitif yönde ilerleme göstermişlerdir. Bu yaklaşımda görüntü kalite metriklerine bağlı sonuçlar görsel yorumlama ile de tutarlı olmuştur. Veri seti yapısına yönelik yaklaşımda ise sonuç görüntünün oluşturulmasında ikinci metodoloji daha iyi sonuçlar üretmiştir. Bununla birlikte eş parçalardan daha iyi olanın seçimi aşaması daha ileri seviye görüntü işleme tekniklerini gerektirmektedir.
1
INTRODUCTION
Image data is an important source of information in today’s life. The information obtained through images and the inferences made based on them constitute the dynamic interaction between humans and facts. Users of all ages utilize various image data for different purposes in many diverse areas. Apart from daily amateur use, the main areas where images are used professionally are medicine, security, automotive, industrial control, remote sensing and mapping (Lillesand et al, 2015). Under these general headings, more specific usage areas are constantly developing. Along with the diversity in image usage areas, the equipment used to obtain it also varies. Regardless of the purpose, field of use, and acquisition hardware, one of the most important factors of an image from the user's perspective is quality. The term quality, which can be considered both qualitatively and quantitatively, is directly related to resolution. (Aplin et al, 1997).
The concept of resolution is divided into 4 different categories. These are spatial resolution, spectral resolution, radiometric resolution and temporal resolution (Lillesand et al, 2015). Among these, temporal resolution refers to the frequency of acquisition of data, especially for remotely sensed (RS) images obtained from aircraft or satellite-based platforms. Therefore, it is not related to the level of distinguishability in the image. Spectral resolution is directly proportional to the number of bands in the image and inversely proportional to the bandwidth. Although it does not provide any information distinguishable by the human eye, it is important for image classification problems.
One of the resolutions that can be evaluated in terms of quality detectable by the human eye is radiometric resolution. The higher the bit value of the image, the higher the radiometric resolution. As presented in Figure 1.1, as the radiometric resolution increases, the gray level increases. Thus, the differences between pixel brightness values are better represented in a wider range.
2
Figure 1.1 : Different examples of radiometric resolution levels: 4-bit image (left) and 8-bit image (right).
The resolution directly linked to visual quality is spatial resolution. which is meaused in pixels. Pixel density per unit area is informative about the spatial resolution level offered by the image. Thus, between two images of the same size, the one with more pixels corresponds to better spatial resolution as illustrated in Figure 1.2.
Figure 1.2 : Relationship between resolution and pixel density.
What is desired from an image is that it has the distinctiveness that will effectively and accurately fulfill its intended purpose. Areas where image data is used in today's conditions often require a high level of distinguishability. At this point, the relevant image must either be acquired with the desired quality by the sensor, or the current quality of the image considered low-resolution (LR) must be brought or approached to the desired high-resolution (HR) level (Nasrollahi and Moeslund, 2014). The former
3
among these is a hardware problem, and performance in terms of better resolution of the image is directly proportional to high costs (Park et al, 2003). For the latter, there is a problem of an image enhancement under digital image processing, which is named as Super-Resolution (SR).
SR is defined as obtaining an HR image from an LR image (Park et al, 2003) and has been a subject studied for many years (Dixit and Yadav, 2023). It is accepted that an LR image is actually a distorted version of an HR image. As a result of this degradation, it is aimed to produce HR image by recovering the lost information that is not in LR one (Farsiu et al, 2004). SR is divided into two groups as single image SR (SISR) and multi-frame super-resolution (MFSR), depending on the number of images to be enhanced (Anwar et al, 2010). Since determining the relationship between LR and HR feature space in MFSR is more complex, studies have been in the direction of SISR (Chauhan et al, 2023). Among SISR studies, improving spatial resolution is the majority. Temporal resolution is independent of image quality. Radiometric resolution is somehow related to visual quality, but it has not a parameter that can be improved within the scope of SR. Although studies have been carried out to improve the spectral resolution in recent years (Mikamoto et al, 2023), the image data used is limited to hyperspectral images. In the following parts of the thesis, only the expression “resolution” will be used to refer to spatial resolution.
Literature Review
Resolution refers to the level of distinguishability of the details in the image. HR images are preferred for both amateur and professional use due to their clarity level. However, not all images used may have sufficient resolution to fulfill their intended use. Although there are various reasons for this situation, the equipment used and shooting conditions are the main factors. Since it is not always possible to obtain or provide a HR image, the LR image needs to be improved (Park et al, 2003).
Obtaining a HR image from a LR image is called SR (Ha et al, 2019; Li et al, 2020; Ooi and Ibrahim, 2021). At this point, what is meant by the concept of HR actually means better than low. It is assumed that a LR image is actually a degraded version of its HR counterpart (Anwar et al, 2020; Wang et al, 2020). In other words, when degrading effects are applied to a HR image, some information will be lost. As a result, a lower quality image will be obtained, which is called LR. In other words, when
4
degrading effects are applied to a HR, some information will be lost. However, the image that needs improvement is LR and, the image that does not exist is HR. Therefore, the transition from LR to HR is an inverse problem (Khoo et al, 2020). To solve this problem, the information thought to be lost must be identified and restored to LR (Chen et al, 2022). SR studies in the literature basically aim to detect the maximum amount of this lost information and minimize negative factors such as smoothing in the HR image after adding the lost information to LR. In this context, the techniques and datasets used in SR studies are handled under separate headings.
1.1.1
Super-resolution techniques
SR studies are focused around SISR. There are various categorizations for SISR in the literature. However, it is divided into 3 categories in terms of the infrastructure on which the technique used is based (Wang et al, 2022a). These are interpolation-based methods (Zhou et al, 2012), reconstruction based-methods (Khan et al, 2011) and learning-based methods (Ha et al, 2019; Khoo et al, 2020; Li et al, 2020; Wang et al, 2020; Ooi and Ibrahim, 2021). Interpolation-based techniques are nearest neighbor, bilinear and bicubic methods (Siu and Hung, 2012). All of these enlarge the image depending on the scale factor. Although bicubic interpolation is known to be the best among three, none of them are actually suitable for SR. One of the reasons for this is that after upscaling from LR to HR, there is a greater smoothing effect on the output image compared to other techniques. Another reason is that interpolation simply increases the size of the image as desired. For example, if the dimensions of a 128 x 128 image are increased by any interpolation technique for scale factors 2, 4, and 8, the dimensions of the images considered to be HR will be 256 x 256, 512 x 512, and 1024 x 1024, respectively, but in the meantime, there will be no new information upload to the LR (Khoo et al, 2020; Ooi and Ibrahim, 2021). Therefore, interpolation-based methods are not preferred for SISR today.
Another one of the methods, reconstruction-based methods, utilize the image priors to recover the lost information. However, these methods can not be efficiently used since knowledge included during the reconstruction process is minimal (Wang et al, 2022a).
SR techniques, which are frequently used and constantly evolving today, are learning-based methods. Learning-based SR techniques are divided into three categories as neighbor embedding, sparse coding, and deep-learning (DL) (Ooi and Ibrahim, 2021).
5
In the learning-based context, DL-based super-resolution techniques are popular with the advancement in machine learning (Dixit and Yadav, 2023). With the help of algorithms used within the scope of DL-based SR, the vanishing gradient problem was tried to be solved and the running time was tried to be reduced by reducing the number of parameters (Ooi and Ibrahim, 2021). Efforts to increase model performance have enabled the use of various learning strategies in SR studies. These network design strategies are mainly linear learning, residual learning, recursive learning, multi-scale learning, dense connections, adversarial networks and attention mechanisms (Wang et al, 2020; Ooi and Ibrahim, 2021; Wang et al, 2022a). This is actually a superficial classification and the strategies used are intertwined in many models. Therefore, in the literature, it is possible to come across a model using adversarial networks in the dense connection category, and it is also possible to come across a model in the dense connection category under residual learning.
DL-based SR studies first started with the model proposed under the name Super Resolution Convolutional Neural Network (SRCNN) (Dong et al, 2014). In this respect, SRCNN is the most basic DL-based SR model. After the SRCNN model, in which three convolutional neural network (CNN) layers are used with linear learning, Fast SRCNN (FSRCNN) (Dong et al, 2016) and Efficient Sub-Pixel Convolutional Neural Network (ESPCN) (Caballero et al, 2017) models have been proposed to increase the performance rate.
After linear learning, models that implement residual learning strategy was proposed. It is assumed that in residual learning, LR and HR are actually quite close to each other, so learning the residuals that express the small differences between them will be effective in recovering the lost information (Ooi and Ibrahim, 2021). In the first SR studies where residual learning was used, global residual learning using a skip connection between input and output was preferred. However, the models were deepened in order to obtain better enhanced images, and global residual learning was insufficient to recover the increasingly lost information. For this reason, local residual learning, which makes a connection every few stacked layers, has started to be used to connect the input to the output, instead of a direct connection. In fact, they both use residuals to connect input and output, but the number of connections used to collect high-frequency missing information differs (Wang et al, 2020; Ooi and Ibrahim, 2021).
6
Residual learning used in DL-based SR studies gave more successful results as the model deepened. However, another adverse outcome of this situation is the high computational cost (Wang et al, 2020). Recursive learning aims to achieve the performance rate achieved by residual learning with a large number of parameters more quickly by using fewer parameters. For this purpose, the training process is completed by controlling the total number of model parameters through parameter sharing in recursive modules (Ooi and Ibrahim, 2021). While sharing the weight properties with the previous layer prevents the number of parameters from increasing, using the same module multiple times provides faster learning (Khoo et al, 2020). Contrary to all these positive aspects, the most effective approach to alleviate the vanishing gradient problem that occurs in recursive learning is residual learning (Wang et al, 2020).
The basic model of residual learning in SR is Very Deep Super Resolution (VDSR) (Kim et al, 2016), while for recursive learning it is Deeply Recursive Convolutional Network (DRCN) (Kim et al, 2016). As in FSRCNN (Dong et al, 2016) and ESPCN (Caballero et al, 2017) after SRCNN (Dong et al, 2014), derivation models such as Enhanced Deep Residual Network (EDSR) (Lim et al, 2017), Residual Dense Network (RDN) (Zhang et al, 2018a), Deep Recursive Residual Network (DRRN) (Tai et al, 2017a), and Deep Residual Dense Network (DRDN) (Wei et al, 2019) was proposed based on VDSR (Kim et al, 2016a) and DRCN (Kim et al, 2016b). As stated in the review research by Ooi and Ibrahim (2021), VDSR gave better results than SRCNN for texture structure. EDSR reflected the texture better than VDSR, but due to the higher number of parameters compared to VDSR, the estimation time per image was longer. In terms of pattern regions, DRCN produces sharper edges than VDSR. DRRN, on the other hand, produced sharper edges than DRCN. An important point here is that the number of parameters for DRRN is 2 times less than VDSR and 6 times less than DRCN (Tai et al, 2017; Dixit and Yadav, 2023). Therefore, going deeper does not always mean better for SR. Among these, RDN (Zhang et al, 2018a) gives better results both in terms of metrics and visuals. In fact, it is still competitive with current models (Yu et al, 2023).
Another learning strategy used in DL-based SR is multi-scale learning. In the study conducted by Sun et al. (2021), it is stated that multi-scale learning can provide a better output image since images at different scales have different properties. It was thought
7
that with learning carried out in this way, more benefit could be made from the images used for training. However, in the study conducted by Li et al. (2018) in which multi-scale learning was used, no better results were obtained in terms of metrics than EDSR. When the scale factor was 2, very close and sometimes even the same values were obtained with EDSR, but as the scale factor increased, the difference increased in favor of EDSR. This also reveals that it is not better than RDN, and no visual comparison is included in this study.
After the mentioned methods, SR studies continued with generative adversarial networks (GAN) (Singla et al, 2022) Visually very successful results were been achieved with the Super Resolution Generative Adversarial Network (SRGAN) (Ledig et al, 2017) model, which trains with the adversarial network through generators and discriminators. In fact, in the study carried out with SRGAN (Ledig et al, 2017), it was proven that the image with a higher pixel-based metric criterion value is not always visually better.
As the name suggests, dense connection refers to complex models containing a large number of connections. SRDenseNet proposed by Tong et al. (2017) is the most well-known DL-based SR model using dense connectivity. However, RDN (Zhang et al, 2018a), where local and global residual learning is used together, can also be considered in the category of densely connected models. Moreover, the ESRGAN (Wang et al, 2018) model, which is an enhanced version of SRGAN, can be considered in the dense connection category, even though it is trained with GANs. Other well-known densely connected DL-based SR models are MemNet (Tai et al, 2017b) and Deep Back-Projection Network (DPBN) (Haris et al, 2018).
Attention networks (Zhu et al, 2021) are the mechanisms on which current studies focus on SR. They are generally built on EDSR (Lim et al, 2017) and RDN (Zhang et al, 2018a). Among the models that use these networks, Residual Channel Attention (RCAN) (Zhang et al, 2018b) and Second-Order Attention Network (SAN) (Dai et al, 2019) stand out. Although RCAN and SAN were not significantly better than RDN in terms of image quality metrics, they gave much better results visually (Al-Mekhlafi and Liu, 2024).
SR models in the literature are not limited to the ones mentioned. There are a multitude of SR models that have been proposed using a wide variety of data sets for different
8
purposes (Wang et al, 2021a). In order to improve algorithm performance, learning strategies are changed, different loss functionsare tested, and network architectures are updated with various hyperparameters. However, efforts to obtain better images through pure algorithm-based updates and modifications have entered a cycle since satisfactory results were achieved in creating HR from LR, especially after attention networks. Moreover, the more complex structures of deeper networks generally provide better performance, however; their use in real-time applications is impractical (Ha et al, 2019). For this reason, the direction of the studies is towards achieving similar performance rates with lightweight networks (Gendy et al, 2023) and using these networks to solve real-world problems (Chen et al, 2022). The distinctive feature of lightweight networks is that the number of parameters used is less (Ahn et al, 2018). Thus, the aim is for the model to run faster. The number of parameter and the operating speed of a model are generally inversely proportional. However, considering the numerical values given in Table 1.1, most of the basic SR models are not lightweight. Although much higher quality images are obtained with increasing accuracy, the practicality of the applications is controversial.
Table 1.1 : Total number of parameters for the basic SR models.
SR Model
Learning Strategy
Num. of Parameters
SRCNN (Dong et al, 2014)
Linear
57 K
VDSR (Kim et al, 2018)
Residual
665 K
DRCN (Kim et al, 2018)
Recursive
1.8 M
DRRN (Tai et al, 2017a)
Recursive
297 K
EDSR (Lim et al, 2017)
Residual
43 M
DPBN (Haris et al, 2018)
Dense Connection
10 M
RDN (Zhang et al, 2018a)
Residual + Dense
22.6 M
RCAN (Zhang et al, 2018b)
Attention
16 M
SAN (Dai et al, 2019)
Attention
15.7 M
1.1.2
Datasets in super-resolution
Studies on SISR have been carried out on various datasets for a long time. These are generally common and freely available datasets that are mostly preferred in many digital image processing problems not just for SR. Table 1.2 provides basic information about common datasets. Among them, DIV2K (Agustsson and Timofte, 2017) is the most preferred dataset for training purposes, while Set5 (Bavilacqua et al, 2012) and Set14 (Zeyde et al, 2010) are the most used datasets for testing purposes.
9
Figure 1.3 and Figure 1.4 presents sample images from the DIV2K and Set5/14 datasets. Except for Urban100 and Manga109, other common datasets are similar in content. The difference is usually in the total number of images and image sizes.
Table 1.2 : Total number of parameters for the basic SR models.
Name
Num. of Images
Image Size
Format
Content
DIV2K (Agustsson and Timofte, 2017)
1000
(1972,1437)
png
People Animal Scenery
T91 (Yang et al, 2010)
91
(264,204)
png
Flower, Face, People
BSD100 (Martin et al, 2001)
100
(481,321)
jpg
Animal Scenery, Plant
Urban100 (Huang et al, 2015)
100
(984,797)
png
Construction Architecture
Set5 (Bevilacqua et al, 2012)
5
(313,336)
png
Baby, Butterfly, Bird
Set14 (Zeyde et al, 2010)
14
(492,446)
png
Baboon Bridge Forema
Manga 109 (Fujimoto et al, 2016)
109
(826,1169)
png
Comics
Figure 1.3 : Examples of training images from the datasets in the literature.
10
Figure 1.4 : Examples of test images from the most used datasets in the literature.
Practices specific to DL-based SR with these data sets have always aimed to improve algorithm performance. Although better images were obtained with more efficient algorithms, the solution of a real-world/engineering problems remained in the background due to the content of the images (Chen et al, 2022). Other than common data sets, several datasets have been created to solve real-world problems by means of different disciplines. For the solution of a real-world/engineering problem, magnetic resonance images (Van Reeth et al, 2012), which are necessary in medical imaging, and RS images which are the main data source for land use and land cover (LULC), have been preferred. Between the two, the latter are the focus of attention as it relats to environmental issues and earth sciences. In this context, there are datasets (Deeba et al, 2020; Wang et al, 2022b) prepared by researchers, as well as datasets (Dai et al, 2010; Zou et al, 2015; Xia et al, 2017) that are readily available to researchers. Information on common RS image datasets used in DL-based SR studies is presented in Table 1.3. The main differences between data sets are the size, format, number of images and the details they contain. Figure 1.5 presents sample images from WHU-RS19 dataset.
Table 1.3 : Common and freely available datasets consisting of RS images.
Name
Num. of Images
Image Size
Format
Content
AID (Xia et al, 2016)
10000
(600, 600)
jpg
Airport Bareland Desert
RSSCN7 (Zou et al, 2015)
2800
(400, 400)
jpg
Residential Farmland
RSC11 (Zhao et al, 2016)
1232
(512, 512)
tiff
Residential Roads
11
Table 1.3 (continued) : Common and freely available datasets consisting of RS images.
WHU-RS19 (Dai and Yang, 2010)
1005
(600, 600)
tiff
Bridge, Forest Building
UC Merced (Yang and Newsam, 2010)
2100
(256, 256)
png
Farmland Highways
Figure 1.5 : Examples of sample images from the WHU-RS19 dataset.
In the literature, there are studies that directly use models created on commonly used datasets in optical images as well as SR approaches specifically designed for enhancing satellite images (Huang et al, 2017; Ren et al, 2021). Liebel and Körner, (2016) firstly used DL-based SR for RS images. Sentinel images constituted the RS image used in their work. As models are built on frequently used common datasets, they do not always produce the expected results in independent applications with different scene characteristics (Huang et al, 2017). Therefore, RS images, which are more complex than common datasets both in terms of resolution concept and qualitative quality, have required different SR approaches (Zhang et al, 2022). Huang et al. (2017) modified VDSR on a data-specific basis to improve RS images. To overcome the complex structure of RS images, Lei et al. (2017) proposed the LGCnet model, in which local and global information are trained separately and then combined. Tuna et al. (2018) evaluated SRCNN and VDSR together with IHS transformation on Spot and Pleiades images. There are various CNN-based (Jiang et al, 2018; Deeba et al, 2020; Ren et al, 2021), GAN-based (Jiang et al, 2019; Liu et al, 2020; Guo et al, 2022), and attention-based (Guo et al, 2019; Haut et al, 2019; Wang and Sertel, 2021b) model suggestions in the literature for improving RS images with SR techniques.
12
Some studies in the literature aim to improve the details that can already be distinguished in LR RS images (Deeba et al, 2020), however; others that use satellite images such as Landsat and Sentinel may be insufficient for high-level detail extraction and scene analysis. In particular, enhancing Landsat images (Wagner et al, 2021), which provide older data compared to Sentinel (Lanaras et al, 2018), has allowed the expansion of time intervals for multi-temporal LULC change detection studies. Previous research reported higher accuracy in classifications conducted with the enhanced images (Cheng et al, 2017; Shermeyer and Van Etten, 2019). The increase in detail level both makes visual interpretation safer and increases classification and detail extraction accuracies (Truong et al, 2020).
Purpose
and Hypothesis of the Thesis
The data group that is almost never evaluated in SR studies is grayscale historical aerial photographs (HAP). HAPs, which are in a key position in solving the problems of the past related to the present, also have restrictive effects in addition to the negative effects found in other RS images. As in satellite images, the quality of a HAP is conditioned by the scenario as well as by environmental, lighting, and atmospheric conditions (Wang et al, 2022a). As distinct from satellite images, it is possible to lose information when converting these photographs from hard copy versions to digital form through scanning procedures. Apart from these, depending on the technology of the period, data acquisition equipment, number of channels in the image and lack of color information are other negative limitations for HAPs.
The literature is lacking a detailed study on improving grayscale HAPs, which is important for solving real-world problems related to the past. HAPs can be traced back 100 years; as it spans such a large time range, this data group is extremely useful for long-term monitoring of the environment. When this monitoring includes change de-tection applications where historical data should be analyzed together with current and HR datasets, they need to be improved spatially in order to achieve more reliability. However, restrictive factors arising from the nature of the data make it difficult for the SR model to learn the relationships between HR-LR pairs. In order to use these images effectively in the mentioned applications, these factors must be minimized and edge details must be preserved as much as possible.
13
HAPs are data group that have practical applications in solving real-world/engineering problems. SR models, which provide much better results in terms of both visual and metric quality, may not be applicable in practice due to the high number of parameters, which is one of the biggest contributors to their current success (Kim et al, 2021). As stated by Ooi and Ibrahim (2021), the most important factors that determine the performance of an SR model are the dataset used in training, loss functions, assessment criteria and network design. In this context, it is essential to achieve sufficient success with lightweight networks designed with fewer parameters. In terms of evaluation criteria, although pixel-based metrics may give inconsistent results with visual interpretation, there are almost no studies in which visual interpretation is incompatible with metrics that take into account structural similarity. Loss functions are not a criterion on their own, but can affect the results in both directions depending on the content of the data set used and the preferred network design. The factor that stands out at this point is the dataset used during training. A DL-based model collects information from data step by step. Therefore, the robustness of the dataset directly determines the quality of the image to be improved with SR (Wang et al, 2022a).
In this context, the aim of the thesis is to better improve HAPs with the DL-based lightweight SR models, with approaches to the data set structure. In this regard, it is intented to increase the performance of existing lightweight SR models by increasing the goodness of the dataset to be created from HAPs. The approaches put forward in this direction are essentially as follows:
1) The images in the dataset to be used during the training of HAPS with the DL-based SR model should imitate the multispectral image by having different resolution values from different years, the color information should be imitated by representing the images in a wide intensity range, and the image sizes in the training data should be larger than those in the RS data sets used in the literature.
2) If a data set prepared based on hypothesis number 1 is grouped according to the hierarchy of photo interpretation elements (PIEs) and trained independently of each other, and if the relevant parts of the image to be improved are reconstructed with the training directly related to it, a higher quality image will be obtained both visually and metrically.
14
15
HISTORICAL AERIAL PHOTOGRAPH
The data obtained using aerial imaging tools is called aerial photography. In the beginnings, the tools used for acquisition were pigeons, balloons and rockets. Later, photographs taken from aircraft during the world wars were used. Currently, HR aerial photographs are obtained with UAVs as well as aircraft-based platforms that are much more advanced than in the past. This development is categorized into different stages depending on the platform and the type of photography. As presented in Figure 2.1, this development took place in four stages as graphical photogrammetry, analog photogrammetry, analytical photogrammetry and digital photogrammetry. In graphical photogrammetry, which spanned between 1850 and 1900, photography was evaluated on a drawing board (Gates, 1984). The period between 1900 and 1960 is covered by analog photogrammetry (Cooper and Robson, 1994), in which analog photographs are evaluated with optical-mechanical devices. In the analytical photogrammetry process between 1960 and 2000, photographs were again analog, but the evaluation tools were computers (Doyle, 1964). Nowadays, the evaluation tool is still computers, but the photographs used are in digital form.
Figure 2.1 : Historical development of photogrammetry.
16
The vast majority of data used in photogrammetric studies are digital aerial photographs. Some of the reasons for this choice are that it is much easier to obtain data than in the past, more effective use due to HR and higher spatial accuracy, more compactness of the hardware used and therefore ease of access to data. Even mini UAVs can be used in documenting cultural heritage (Bakirman et al, 2020), mapping (Ludwig et al, 2020; Štroner et al, 2021) and many other areas (Kovanič et al, 2023). However, the use of aerial photographs obtained with older technological means is not over. Photographs taken at older dates, an example of which is presented in Figure 2.2, have an archival nature and are an important data group in solving current problems related to the past. The main difference between old-dated aerial photographs and those available today is the quality in terms of visuality and metrically. Therefore, current resolutions of HAPs need to be improved in order to use more effectively in today's studies. However, minimizing the limiting effects inherent in HAPs is important for the performance of the enhancement with the SR technique.
Figure 2.2 : An example of HAP from Vermont Data Portal (Url-1).
The first of the limiting effetcs for HAPs is the number of spectral bands. HAPs, compared to other optical images, are single-band data. Therefore, operations such as simultaneously utilizing existing RGB bands or taking into account correlations between different spectral bands, as in channel attention structures, are not possible for HAPs. Since the highlights of each spectral band cannot be combined to obtain a better result image, the maximum information that can be extracted is limited to a
17
single band. Moreover, DL-based SR models are built on multispectral images. This limits the direct use of existing models on HAPs in terms of the running the algorithm.
A direct consequence of the single band condition is that the image is grayscale. Color information which creates distinguishability of objects and differences among them is important for SR as well as for other digital image processing applications such as classification and extraction of LULC features. The lack of color information negatively affects the content richness of the data set. When two objects that are identical or similar in shape and size are both grayscale, their difference diminishes. This is a factor that makes it difficult for the SR model to learn different features during training.
Another limiting effect of HAPs is the limited number of LULC classes. Because old dates are involved, certain LULC classes may be limited of absent in HAPs. Due to urbanization, residential areas can be found in more recent HAPs. This also means that there is less agricultural land in more recent HAPs. Conversely, it is difficult to find content of the residential area in HAPs from older years. In older photographs, there are more contents of bare land and forest areas. In summary, the problem arising from the content is that it is difficult to obtain details from different years for the same class. Another problem depending on the acquisition date is that HR and LR image pairs for HAPs are not exactly similar to the datasets present in the literature. A HAP which is considered to be theoretically HR may be objectively LR compared to current optical images. Therefore, it is more difficult for a model to learn the difference between two attributes and recover the lost information.
Another issue in terms of SR, not specific to HAP but for RS optical data, is what the concept of resolution means. There are 4 different resolutions for image data as spatial, temporal, spectral and radiometric. In SR applications, when it comes to data other than optical remote sensing, a specific resolution value is not specified. What is meant by resolution for this type of data is the clarity of the image. When it comes to resolution for optical RS data, the first thing that comes to mind is spatial resolution, which refers to the level of detail. However, as presented in Figure 2.3, the resolution value presented for optical data is not valid for the entire image and ground sample distance (GSD) gives more realistic information about the level of detail.
18
Figure 2.3 : Relationship between spatial resolution and GSD (Adopted by Url-2).
For scenario 1 in the related figure, spatial resolution and GSD mean almost the same thing. The situation is different for the 2nd and 3rd scenarios. In the 2nd scenario, in a topography that is assumed to be flat, the GSD value is different for the land cover and land use objects of different heights and sizes. Therefore, the level of detail in terms of distinguishability is different for each. The GSD value is also different for land use objects located at different elevations of the topography, although they are of the same/similar size and height as in the 3rd scenario. Therefore, in these scenarios, a specific spatial resolution value cannot be accepted for the entire image. For this reason, while improving with a HAP SR, regions with lower quality than the value known as spatial resolution are also tried to be improved. As presented in Figure 2.4 for a HAP, photograph shooting distances differ from each other because objects of similar sizes are located at different heights. Therefore, although they are in the same photo, their resolutions are different from each other in terms of clarity.
Figure 2.4 : Details of different clarity in the same photo.
19
Since the creation of the data set was not done randomly from start to finish, 4 orthophotos covering a larger area were provided rather than dealing with a large number of photographs. Orthophotos cover a part of the province of Istanbul, and the area surrounding each of them is exactly the same. The orthophotos used are presented in Figure 2.5, and how the data is manipulated to obtain photograph data is explained in the next section within the scope of practical applications.
(a) (b)
(c) (d)
Figure 2.5 : Grayscale historical orthophoto images photographs (a) Year: 1954 Resolution: 30 cm; (b) Year: 1968 Resolution: 40 cm; (c) Year: 1982 Resolution: 10 cm; (d) Year: 1993 Resolution: 40 cm.
20
21
SUPER-RESOLUTION PRACTICES
Information is extracted from RS digital images in accordance with the purpose. This process is basically carried out through visual analysis, where images are interpreted by experienced and professional people, and image processing algorithms that include mathematical and statistical relationships in the computer environment. As stated in the purpose of the thesis, it is aimed to make HAPs more effectively interpretable within the scope of visual analysis and to provide more resolute input data for a more detailed analyzes such as classification or feature extraction. In this context, different hypotheses have been developed for both purposes. Therefore, practical applications for both purposes are presented under different subheadings. The subheadings in this section explain the steps taken for the purposes of the thesis and provide information about the models used.
Approach to Dataset Content
A DL-based SR model should be trained on a data set from which as many different features as possible can be learned. The data set to be prepared should include different LULC details. Due to the limitations of HAPs mentioned in the previous section, it is essential to represent them with a more specific data set than other optical remote sensing data. If the data set is insufficient both in terms of content and the number of images, it does not matter which SR model is used. In such a case, it is possible that either overfitting or underfitting occurs during training. Therefore, the more effectively the restrictive effects are minimized, the more balanced the data set will be, which will be reflected objectively and subjectively in the quality of the resulting image. In this context, two different datasets (dataset-1 and dataset-2) were created from existing orthophoto images and it was determined how the proposed data set structure affected the output image in a SR model. The applied methodology is illustrated in Figure 3.1.
22
Figure 3.1 : Applied methodology for the approach to dataset content.
Orthophotos from 1954, 1968 and 1982 were used for the approach regarding dataset content. The orthophotos provided slow down the data processing speed because they are in tiff format and cover a large area. In order to reduce the computational cost, all orthophotos were divided into four equal parts. Since each piece is still heavy data due to the resolution value for the 1982 image, it has been divided into equal parts again.
4 different LC classes were determined to extract image parts from orthophotos. These are bare land, farmland, residential areas, and forested regions. Due to the increase in urbanization over the years and the decrease in bare land, agricultural areas and forest areas, residential areas are mostly seen in the image of 1982. However, since its pixel sizes are 3-4 times smaller than the other two images, it is also a rich image for other LC classes. In the images from 1954 and 1968, abundance of bare land and forest areas stand out.
The images to be used during training were extracted from the sub-images obtained from orthophotos. At this point, images of the desired dimensions were generated by random cropping and manuel cropping. Considering that the orthophoto is produced from a large number of photographs, each cropped image is an HAP. While applying the random cropping process, it was determined that each image produced should have
23
a maximum overlap of 10% between all images produced before it. In a way, this was an implementation of shifting, a geometric data augmentation technique for images. The reason for extracting the training data in this way is to prevent the data set from being prone to overfitting by keeping the similarity between the images at a certain level. Manual cropping was used in sections where the same region was represented by images from at least 2 different years. This process was also used within the scope of minimization. There are more images from 1982 in the training data, both because they are richer in terms of content and because they represent the same area with more pixels. The 1954 and 1968 images were fewer in number, as they are from older years and have fewer pixels covering same area. Some images derived from orthophotos and included in the training dataset are presented in Figure 3.2.
Figure 3.2 : Applied methodology for the approach to dataset content.
24
An important nuance when creating a training set from this type of data is the number of images. The lack of color information in the images increased the similarity of objects in the same class to each other, and this was also valid for objects in different classes. Because it is thought that visual similarity will negatively affect learning after a while, the total number of images included in training was limited to 1000 for both datasets. As can be seen from Figure 3.1, dataset-1 is a dataset prepared to minimize the limiting effects in HAPs. Table 3.1 shows the solution proposal for each limiting effect. The scope of each recommendation is discussed in detail below.
Table 3.1 : Limiting effects and minimizing solutions.
SR Model
Learning Strategy
Single-band
Multi-spectral data imitation
Lack of color
Wide intensity range
Content of HR-LR pairs
Larger image sizes
The most obvious limitation for HAPs was that the image was single-band. Single banding of an image has limitations in two respects. One of these is to run the algorithm in the DL-based SR model. The other is that the details in the image are characterized by a single band. For the former, creating two copies of the same band and then combining them with the original is a simple but safe solution for running the algorithm. Current DL-based SR models were created for multispectral images. Although it is possible to organize the model to perform training on a single-band image, it is a more complex process and the data may already be presented by the manufacturer by placing the same band in 3 channels. For the latter, multispectral image structure was imitated. Figure 3.3 represents a multispectral image and the three different bands it contains. As can be seen from the figure, due to reflection differences, three different bands are three different images with different features for a DR-based SR model.
Figure 3.3 : Multi-spectral image and separate bands.
25
An SR model built on multispectral images collects information from each band in each image during training. As stated in the literature, the attention structures used in today's SR studies take into account the correlation between the spectral bands in each image and the prominent features of each during training. Therefore, representing a region with different bands is important and necessary for SR. For this reason, in order to overcome the situation of being represented by a single band in terms of content, photographs from different years covering the same regions were used together. Details within LC classes have changed more or less over time. If details that have not undergone much structural change are found in orthophotos from three different years, it is as if 3 different bands of the same image are included in the data set. An example of this situation is presented in Figure 3.4. However, due to time lapse, some objects in LC classes may be present in the older image but not in the newer image. Conversely, details that are not present in the older image can be found in the newer images. In such cases, 2-band image imitation was performed and thus, two spectral variations were obtained for regions representing the same content in the dataset These operations are not available for dataset-2.
26
Figure 3.4 : Imitating multi-spectral data.
27
Another disadvantage that must be minimized for HAPs was grayscale representation, which is a natural consequence of being single band. Representation with grayscale also means lack of color information. The differences between the images in the training data set are the diversity for the model to learn. In addition to representing the details in the LC class with different images, the color information they will contain is one of the factors that provide differentiation. This difference can be easily detected in Figure 3.5. Color information that is not present in the grayscale image reduces the difference between the images in the data set. This causes different photographs to display higher similarity than color image seven though they contain different regions.
Figure 3.5 : Importance of color information for the diversity in an image.
To minimize limitation regarding the lack of color, the variation provided by color information was imitated by means of intensity values. The brightness value in multispectral images, which is the numerical equivalent of color information, is replaced by intensity values in grayscale images. These values were used to control details and brightness adjustments in grayscale images. For dataset-1, as wide an intensity range as possible was used for images belonging to 4 different LC classes. For the other data set, images represented in a narrower range were used. The need to provide distinguishability due to the lack of color information was minimized by creating photographs that contain homogeneous intensity values. A visual depiction of the intensity distributions of the images in both datasets is presented in Figure 3.6.
28
Figure 3.6 : Intensity distributions of dataset-1 and dataset-2.
Another limitation whose effect is minimized is the inadequacy in terms of the current image quality obtained for HR-LR pairs, depending on the technological conditions of the period, and the restricted number of LC classes depending on the time when the data was acquired. The quality of a RS image also depends on the sensor properties used. The sensors used when HAPs had been acquired were not as advanced as those used today. Therefore, the images themselves, which contain the features to be learned by the model for improvement, are already deficient in quality compared to the data used in today's SR studies. Considering that the images under these conditions contain a limited number of LC classes, the image presented as input is not strong for a model to learn. To minimize this limitation in dataset-1, higher-dimensional images were
29
used compared to images in the optical datasets referred in the literature. While images with a fixed size of 512 x 512 were prepared for dataset-2, there are images with varying sizes in data set-1. These dimensions mostly vary between 1200 x 1200 and 2400 x 2400. Even if these images are scaled to much smaller sizes as input during training, the diversity in the region where each kernel travels has been increased in the cropped images. A visual photo presenting the difference between input images in both datasets is presented in Figure 3.7.
Figure 3.7 : Training image in dataset-1 (1600 x 1600) and dataset-2 (512 x 512)
Approach to Dataset Structure
The approach to dataset structure is based on the principle of hierarchy of PIEs. Understanding various features and extracting information by analyzing the content of a photo is called photo interpretation. In other words, photo interpretation is the evaluation of images with the human eye and the detection of different features depending on the evaluation made. These different features can be spatial, spectral and temporal, which most of the elements are related to spatial ones. As presented in Figure 3.8, PIEs are categorized according to complexity of the elements. These are primary, secondary, tertiary and higher elements. Each group has its own sub-elements. Tone and color constitute the primary elements. Both are related to the spectral reflectance characteristics of objects. Therefore, differences of tone and color increase the
30
distinguishability of objects from each other. Secondary and tertiary elements are also spatial arrangement of tone and color. Among them, size indicates the scale of an object relative to other objects. Shape is the general form of objects formed by their physical outer boundaries. Texture is the frequency of tone change in the image. Pattern is the spatial arrangement of objects.
Figure 3.8 : Photo interpretation elements and their hierarchy.
Models built on common datasets do not always yield positive results when used directly in their original form on RS images. For this reason, SR models have been constantly modified, some researchers have reduced the number of parameters, while others have made the models more complex. The approach recommended in this section is to create the dataset structure in accordance with the characteristics of the image to be improved, regardless of the model to be used. In this context, the basic idea is this: Details in an image consist of interpretation elements. Therefore, improving the interpretation elements is actually improving the image. These elements are especially valuable for HAPs among RS images.
The HAPs in the thesis study were created in 4 different categories: residential areas, farmland areas, forested areas and bare land. Among these, the class where tone and color changes are most evident is bare land and forested areas. Therefore, these two classes were accepted as the primary category of the PIEs. The class that reflects the shape, size and texture elements in the secondary category is residential areas. It is considered the secondary category in this class. Among the elements in the tertiary group, it is the pattern that provides the most useful information for interpretation. It is the farmland areas that carry information about this element. For this reason, the
31
farmland areas class was accepted as the tertiary category of the PIEs. For each category, separate datasets were created using images in the relevant classes, depending on the approach applied to the dataset content. Unlike the first approach, images derived from orthophoto of 1993 were also used. A visual of this structural approach is presented in Figure 3.9. The methodology applied subsequently is presented in Figure 3.10 and 3.11.
The advantage of models with fewer parameters is that they work quickly. Their disadvantage is that they can handle a limited number of images. The increasing number of images during training may cause lightweight models to be prone to overfitting. For this reason, instead of training a total of 3000 images belonging to 3 different categories at once, 1000 images of each category were trained separately, considering that it would be more convenient for practical applications. Unlike the content-based approach, images extracted from orthophoto from 1993 were also included.
In the methodology presented in Figure 3.10, three different trained models are obtained from three different datasets. The same test image was improved with each model. Three different improved images obtained for one test image were concatenated based on their averages and the final image was created.
In the methodology presented in Figure 3.11, the image quality metric was used instead of directly concatenating the resulting improved images. In this context, each enhanced image was divided into equal parts of 5x5 size, starting from the upper left corner. A reference-free image quality metric was calculated for each corresponding part. The metric used is the Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) (Mittal et al, 2012). The BRISQUE takes values between 1 and 100 and determines the quality of an image without a reference image. It measures the quality of an image based on certain features. In particular, it takes into account the quality of natural scene elements in the image. It evaluates various features such as color, contrast, sharpness, saturation. The combination of these features produces a score that represents the perceived quality of an image. A lower BRISQUE value represents higher quality.
32
Figure 3.9 : Photo interpretation elements and their hierarchy.
33
Figure 3.10 : Methodology depending on the average concatenation of the individually enhanced images.
Figure 3.11 : Methodology depending on the BRISQUE values of the images.
Super-Resolution Implementation
SRCNN is the first of the DL-based SR models. When SRCNN was first introduced, the fact that it gave better results both visually and metrically compared to previous approaches with the existing data sets used, enabled the increase of DL-supported SR
34
studies. Researchers who wanted to obtain better results with commonly used data sets first modified SRCNN. Later, instead of updating the existing model, different models were proposed with different learning strategies. Although it is simpler than current SR models due to the structure of the network architecture, each proposed model is compared with SRCNN in terms of both visual and metric values in current studies, as in previous studies. The approach put forward regarding the structure of the dataset in this section of the thesis is actually valid not only for SRCNN but also for other SR models. However, since the smoothing effect is more evident in the output image after SRCNN, it will be more effective to use SRCNN to address the effect of the approach related to the dataset.
The simplified architecture of the SRCNN model is presented in Figure 3.12. Because the aim to reveal the improvability of HAPs and overcome their limitations rather than to show the superiority of one model over another, the SRCNN model was used directly. In order to objectively compare the two data sets, the original parameters for the number of filters and kernel sizes were not changed, nor were any hyperparameters such as dropout, batch normalization or L1/L2 added. It is thought that it is more realistic to use the model as provided in order to minimize the constraints arising from the dataset. In addition, the ability of the proposed dataset design to cope with the smoothing effect, which is frequently mentioned for SRCNN and similar models, was determined.
Figure 3.12 : SRCNN network architecture (Adopted by Dong et al, 2014).
In SRCNN, transformation of an LR input image into an HR output image were made by means of learned weights. There are three main layers as feature extraction, non-linear mapping, and reconstruction. The input layer accepts a LR input image. The feature extraction layers apply various convolution and activation operations to extract
35
features from the LR input image. These layers use learned filters to capture features in the image. The reconstruction layer creates a HR output image using feature extractions (Dong et al, 2014). This process allows the LR image to gain more detail and clarity. As the activation function, ReLU was used in the first two layers and linear was used in the last layer. The filter numbers were 128, 64, and 1, respectively, while the kernel sizes were set to 9, 3, and 5. The training process was conducted for a scale factor of 4, whereas the input sizes for the HR-LR pair were set to 256 and 64. ADAM was used as the optimization algorithm with a value of 0.01. Early stopping was used during training to prevent overfitting and to make the training pro-cess more effective. Implementation and testing of the model were carried out in the Google Colab environment.
36
37
IMAGE QUALITY ASSESSMENT
Image Quality Metrics
Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). are the most preferred metrics for the enhanced images with SR. However, Root Mean Squared Error (RMSE), Universal Image Quality Index (UIQI) and Bhattacharrya distance metrics were also included in the thesis study.
The RMSE value is a metric used to measure the similarity or difference of pixel values between two images (Girod et al, 1993). RMSE is represented by a positive value and a smaller value indicates that two images are closer or similar to each other (Keshk et al, 2014; Greeshma et al, 2020). RMSE is the square root of mean squared error (MSE), and is calculated accordingly:
RMSE= √(1n) × Σ(Yi− Xi)2 (4.1)
where:
Yi represents the value of the data point in HR;
Xi represents the value of the data point in the test image;
n represents the total number of data points.
PSNR measures the representativeness of the image. The higher the PSNR value, the higher the similarity between two images and the better the quality is considered. However, PSNR has a disadvantage in that it does not reflect the quality perceived by the human eye (Medda and DeBrunner, 2006; Greeshma and Bindu, 2020). It is usually expressed in dB (decibels), and is calculated by the formula below:
MSE= (1n) × Σ(Yi− Xi)2 (4.2)
PSNR = 10 × log10((M2)/MSE) (4.3)
where:
38
M represents the maximum possible value of pixel values;
MSE represents the mean square error between the HR and the test image.
SSIM attempts to measure the structural similarity between two images, and expresses this similarity as a score. It takes a value in the range [0, 1], where 1 indicates the highest similarity and best quality while 0 indicates the lowest similarity and worst quality. Its calculation includes three components: luminance similarity, contrast similarity, and structural similarity. Luminance similarity measures the similarity between the luminance components of two images. Contrast similarity measures the similarity between the contrast components of two images, Structural similarity measures the similarity of two images. By combining these three components, SSIM produces an overall similarity score (Wang et al, 2004; Greeshma and Bindu, 2020). The higher the SSIM score between two images, the higher the similarity and quality. SSIM is calculated by the formula below:
SSIM(x,y) = [𝑙(x,y)]α × [𝑐(x,y)]β × [𝑠(x,y)]γ (4.4)
where:
l is the luminance;
c is the contrast;
s is the structure;
α, β, γ are the positive constants.
𝑙(𝑥,𝑦)= 2𝜇𝑥𝜇𝑦+ 𝐶1𝜇𝑥2 + 𝜇𝑦2 + 𝐶1 (4.5)
𝑐(𝑥,𝑦)= 2𝜎𝑥𝜎𝑦+ 𝐶2𝜎𝑥2 + 𝜎𝑦2 + 𝐶2 (4.6)
𝑠(𝑥,𝑦)= 𝜎𝑥𝑦+ 𝐶3𝜎𝑥𝜎𝑦 + 𝐶3 (4.7)
where:
μx and μy are the local means;
σx and σy are the standard deviations;
σxy is the cross-covariance for images x and y sequentially.
In the case of α = β = γ =1, SSIM formula is simplified as follows:
39
SSIM(x,y) = (2𝜇𝑥𝜇𝑦+ 𝐶1)(2𝜎𝑥𝜎𝑦+ 𝐶2)(𝜇𝑥2 + 𝜇𝑦2 + 𝐶1)(𝜎𝑥2 + 𝜎𝑦2 + 𝐶2) (4.8)
Universal Image Quality Index (UIQI) (Zhou and Bovik, 2002) is another metric used to evaluate image quality. It is known that the UIQI metric is better than PSNR and RMSE in reflecting the quality perception of the human visual system (Medda and DeBrunner, 2006). This metric determines the quality of an image using brightness and contrast distortion as well as correlation loss. UIQI is calculated by the formula below:
Q(x,y) = 𝜎𝑥𝑦𝜎𝑥𝜎𝑦×2𝜇𝑥𝜇𝑦𝜇𝑥2 + 𝜇𝑦2×2𝜎𝑥𝜎𝑦𝜎𝑥2 + 𝜎𝑦2 (4.9)
The first term of the formula is the correlation coefficient between images of x and y. The second term measures the similarity of the mean luminance belonging to two images. If the mean for both images is equal, the similarity is equal to 1. The third term refers to the similarity of contrast in both images.
The fifth metric used in the study is the Bhattacharyya distance value (Aherne et al, 1998), which is a statistical metric that measures the similarity or difference between two probability distributions. Values range from 0 to ∞, with a smaller Bhattacharyya distance indicating that two distributions are more similar and a larger distance indicating more dissimilarity or separation (Goudail et al, 2004). Bhattacharyya distance is calculated according to the following formula:
BC(H1,H2)= −ln(Σi = √nh1ih2i) (4.10)
where:
H1 and H2 are the histograms;
h1i and h2i are histogram values for the region.
Visual Interpretation
Visual interpretation is what a person sees and understands with the naked eye. Although visual interpretation technically refers to the evaluation of the image by considering the photo interpretation elements, for SR it means observing what is clearer and better than the other. It is independent of statistics and involves subjective evaluation. However, completely opposite evaluations by different persons are not expected in the visual interpretation of an improved image.
40
41
RESULTS AND CONCLUSIONS
The results obtained based on two different approaches to the dataset content and structure were discussed with both image quality metrics and visual interpretation. Different evaluation methods are presented together in order to evaluate the compatibility of image quality metrics and visual interpretation. Extra metrics other than the metrics frequently preferred in the literature were included in the evaluation. Based on the proposed approach, it was examined whether the visually detected positive progress was parallel to the trend of change in metric values. In addition, the prominence of the smoothing effect appearing in the result image was visually interpreted in the enhanced images from both approaches. In fact, minimizing or eliminating this effect can also be achieved by modifications to be made in network designs. However, a modification made to a specific dataset in a network design may not yield similar positive results in another dataset. Therefore, it can be stated that for a parameter that is stated to be necessary to be added, it does not have a positive effect on the resulting image in subsequent studies and has no effect other than extra load on the model.
The findings obtained as a result of the approach put forward using the SRCNN model for the dataset content and dataset structure are presented in the images below. In comparative analyses, improved images obtained using dataset-1 and dataset-2, as well as LR and HR images, are presented. LR images presented as test images were obtained by downscaling using the bicubic interpolation technique from the original image, which is considered as HR. The scale factor used is 4, meaning that the LR image is 4 times smaller than the HR and is enlarged 4 times with the SR technique. The reason why LR is presented in the same dimensions as both HR and improved images in the relevant images is that the difference between the current LR and HR can be clearly detected by the human eye and, accordingly, the progress in the improved images can be more easily detected. For the approach to the dataset structure, the two final images obtained are presented together.
42
Figure 5.1 : Visual comparison between LR, HR, and improved images for farmland class.
43
Figure 5.2 : Visual comparison between LR, HR, and improved images for residential class.
44
Figure 5.3 : Visual comparison between LR, HR, and improved images for forest class.
45
Figure 5.4 : Visual comparison between LR, HR, and improved images for bareland.
46
Image quality metrics calculated for images enhanced based on the approach to dataset content are presented in Table 5.1 and Table 5.2. The first of these metrics perform pixel-based comparisons. SSIM and UIQI metrics both take structural similarity into account. It is not possible to encounter the UIQI metric in SR studies. However, there may be cases where PSNR and SSIM metrics do not give consistent results together. While a visually better image may have a higher SSIM value, it may have a lower PSNR value than a visually worse image. For this reason, the UIQI metric was also used in order to interpret the results more reliably. The Bhattacharrya metric works based on the difference between two image histograms. Although it is not a preferred metric in SR studies, it is known that metrics based on different principles give variable results in different data sets and different models. Therefore, in the evaluation of historical aerial photographs, it was examined whether they could be used for HAPs other than those frequently used in the literature.
Table 5.1: Metric results obtained for different classes from dataset-1 and dataset-2 in the approach to dataset content.
Class
RMSE
PSNR
SSIM
UIQI
Bhattacharrya
Farmland
5.80/6.32
32.86/32.11
0.8773/0.8659
0.9182/0.9106
0.025/0.032
Residential
5.49/5.99
33.33/32.58
0.8692/0.8566
0.9128/0.9044
0.022/0.029
Bareland
4.66/4.69
34.76/34.71
0.8564/0.8514
0.9042/0.9009
0.053/0.045
Forest
4.54/4.66
34.99/34.77
0.8869/0.8796
0.9246/0.9197
0.025/0.018
Table 5.2: Metric results obtained for different mixed areas from dataset-1 and dataset-2 in the approach to dataset content.
Image
RMSE
PSNR
SSIM
UIQI
Bhattacharrya
Mixed-1
5.38/5.74
33.50/32.95
0.8737/0.8647
0.9158/0.9098
0.025/0.022
Mixed-2
5.01/5.44
34.13/33.42
0.8828/0.8719
0.9219/0.9145
0.026/0.021
Mixed-3
5.50/5.89
33.32/32.72
0.8798/0.8701
0.9199/0.9133
0.016/0.019
Mixed-4
5.28/5.54
33.67/33.26
0.8675/0.8587
0.9116/0.9058
0.019/0.017
Mixed-5
5.22/5.57
33.76/33.20
0.8757/0.8666
0.9171/0.9110
0.016/0.015
Mixed-6
5.18/5.51
33.83/33.30
0.8724/0.8619
0.9150/0.9079
0.016/0.017
Mixed-7
5.29/5.67
33.65/33.06
0.8729/0.8634
0.9152/0.9089
0.016/0.018
Mixed-8
5.77/6.03
32.91/32.52
0.8420/0.8338
0.8947/0.8892
0.017/0.020
47
Figure 5.5: Visual comparison between enhanced images for mixed area (approach to dataset content)
48
Figure 5.6 : Visual comparison between enhanced images for mixed area (approach to dataset content).
49
Figure 5.7 : Visual comparison between enhanced images for mixed area (approach to dataset structure).
50
Figure 5.8 : Visual comparison between enhanced images for mixed area (approach to dataset structure).
51
Tables 5.3 and 5.4 show the results obtained with the approach to the data set structure. Sample visuals related to these tables are Figure 5.7 and Figure 5.8.
Table 5.3: Metric results obtained for different mixed areas from the images concatenated according to average of the images (approach to dataset structure)
Image
RMSE
PSNR
SSIM
UIQI
Bhattacharrya
Mixed-1
5.48
33.35
0.8657
0.9104
0.036
Mixed-2
5.14
33.91
0.8753
0.9168
0.035
Mixed-3
5.62
33.13
0.8716
0.9144
0.021
Mixed-4
5.39
33.50
0.8581
0.9053
0.025
Mixed-5
5.33
33.59
0.8686
0.9124
0.023
Mixed-6
5.31
33.63
0.8633
0.9088
0.022
Mixed-7
5.41
33.47
0.8661
0.9107
0.025
Mixed-8
5.82
32.83
0.8339
0.8893
0.024
Table 5.4: Metric results obtained for different mixed areas from the images concatenated according to BRISQUE values of the images (approach to dataset structure)
Image
RMSE
PSNR
SSIM
UIQI
Bhattacharrya
Mixed-1
5.49
33.33
0.8702
0.9135
0.021
Mixed-2
5.14
33.90
0.8788
0.9192
0.022
Mixed-3
5.65
33.08
0.8745
0.9163
0.015
Mixed-4
5.37
33.52
0.8628
0.9085
0.017
Mixed-5
5.36
33.52
0.8718
0.9145
0.015
Mixed-6
5.29
33.65
0.8684
0.9123
0.016
Mixed-7
5.43
33.42
0.8690
0.9127
0.015
Mixed-8
5.82
32.83
0.8403
0.8935
0.016
The SR application was carried out with a scale factor of 4. Images of different sizes in the training dataset were arranged as 256 x 256 and accepted as HR. The four times smaller versions of these with the interpolation technique were accepted as LR with a size of 64 x 64. Depending on these values, the SR application was performed on two different datasets. Dataset-1 included the content-based approach recommended in the thesis. Dataset-2 kept the brightness values within a limited range and contains cropped imaages of size 512 x 512. When the LR image, HR image, and improved images are evaluated based on visual interpretation, it can be clearly seen that dataset-1 is closer to the HR images. The negative effect that occurs in the images enhanced in SR studies involves a smoothing effect. The proposed approach aims to obtain higher metric values by minimizing this effect. When the images enhanced with dataset-2 are examined in both images, it can easily be observed visually that they
52
contain more of a smoothing effect than the images enhanced with dataset-1. At this point, it is suggested specifically for HAPs that the same region has to be represented by images from different years and features extracted from three different bands. The intensity values of the images vary in a wide range, meaning that they are distributed homogeneously, and the image sizes that have to be included in the training phase are larger than those used in the literature, leading to a positive impact on the results. However, the smoothing effects which remain present in dataset-1 can further be eliminated with models using different learning strategies.
When the image quality metric results regarding content-based approach are examined, the values obtained in both enhanced test images which is presented in Table 5.1 and Table 5.2 exert consistent results with visual interpretation. The preceding values are for dataset-1 containing the proposed approach. Among the values presented together, those on the right belong to dataset-2. The differences between the mutually obtained values for the enhanced images are lower than those obtained in the literature. However, this is an expected situation, as a single model was used on different datasets. In addition, as the scale factor increases in SR studies, the metric values presented by different models grow closer to each other. In this study, 4 was used as the scale factor, as suggestions for solving a real-world problem were presented. In HAPs, the distortion amount detected when downsampling the original HR image to LR was more than the distortions observed when conducting the same downsampling on natural images such as Set5 and Set14. This situation limits the features that the model can learn during training. Therefore, the ability to improve the test image may have been limited.
Consequently, there was positive progress in the content-based approach in terms of both pixel-based metrics and in metrics that underline structural similarity. The conducted study is distinguished by the use of UIQI and Bhattacharyya metrics, which are not used in most SR studies. The results for the UIQI metric showed a similar trend to the SSIM. However, the differences between the values obtained for the two scenarios are less than in SSIM. This may be because the range covered by UIQI values is wider. Because UIQI can be used with a color-sensitive approach, it can be used more efficiently for the grayscale aerial photographs to be colorized. Compared to Bhattacharyya, it was determined that UIQI can be used in the evaluation of enhanced grayscale aerial photographs. Considering all test photographs, it was found that the
53
Bhattacharyya measure was not sensitive enough to the current dataset, despite the fact that both metrics produced favorable findings. Lower Bhattacharyya distance values were obtained in some of the images where the smoothing effect was more prominent, with the exception of the test images provided as examples. In this scenario, lower metric distances suggest a greater similarity between the two images. Furthermore, it is currently not feasible to effectively assess this metric in HAPs due to the fact that the distance values fall within a relatively limited range.
The approach to the structure of the dataset differs from its counterparts in that it includes non-reference image quality metrics. The number of metrics that determine the quality of an image without a reference image is much less than the metrics that determine quality between two images. For this reason, BRISQUE, the most well-known of a few metrics, was used. An lightweight model can handle a limited number of images despite its speed. Therefore, instead of a dataset containing 4 different classes in the same number of images, it is recommended to train the same number of images belonging to these classes separately. The main goal is to improve the LR image with weights from three different trainings, depending on the hierarchy of PIEs, and then gather the improved details together. What is important at this point is to determine which image has the better features. The better ones of the identical parts were concatenated based on BRISQUE values. In order to demonstrate the effect of this approach, a separate final image was created by taking the average of three improved images of one test image. The same metrics used in the content-based approach were used. The images in Figures 5.7 and 5.8 reveal that the BRISQUE metric is successful in selecting reference parts. The results taking into account structural similarity from the values in Tables 5.4 and 5.5 also support visual interpretation. However, pixel-based metrics did not provide evaluable results in the method based on PIEs. When Table 5.4 is compared with Table 5.3, it is seen that the approach based on PIEs is lower than the results obtained from dataset-1 in the content-based approach and is higher than the results obtained from dataset-2. However, there is no significant difference detectable by visual interpretation between the images improved based on dataset-1 in Figures 5.5 and 5.6 and the images improved based on the BRISQUE metric in Figure 5.7 and Figure 5.8.
54
55
CONCLUSIONS AND RECOMMENDATIONS
One disadvantage when enhancing a RS image with SR compared to other common datasets is that it contains objects in numerous different distributions, in addition to atmospheric and environmental conditions, which affect the effectiveness. These factors prevent the desired results from being achieved when models built on common datasets are directly applied to RS images. This is because it is difficult to claim that existing SR models are generalizable in all cases. Therefore, in some SR models, network design has been simplified, while in others, it has been made more complex to obtain better output images. The difference in datasets of RS images in the literature has hindered the adoption of a clear approach. The main reasons for this are the total number of images in the datasets, the variability in image sizes, and the variability in their contents.
In this thesis study, grayscale HAPs have been enhanced using lightweight SR models. HAPs have additional constraints beyond negative factors found in other optical RS images. These are single-band representation, lack of color, and insufficient content of LR-HR pairs. In this context, various approaches have been introduced to minimize these constraining factors. These approaches have been dataoriented. Although the performance of DL-based SR models depends on factors such as the loss function used, evaluation criteria, and network design, the contribution of others is limited if the dataset does not conform to the characteristics of the image to be enhanced. Therefore, data-oriented approaches for HAPs have been designed both in terms of content and structure.
When it comes to HAPs, the aforementioned additional constraining effects make the enhancement process even more challenging. Approaches aimed at both the content and structure of the dataset have been introduced to alleviate this difficulty. In terms of content-oriented approaches, the limitation arising from images being single-band has been eliminated for running algorithms, and it has been minimized for algorithms to utilize different spectral properties. Artificial three-band images were obtained by using two copies of each image. This eliminated the algorithmic barrier in running the SR model. Images containing the same region from orthophotos of different years and resolutions were derived to imitate the multispectral data. Although not all bands of
56
each image could be included in the training, three different bands of the same region with seemingly different characteristics were included in the training.
The indirect result of being single-band is that the images are grayscale. That is, HAPs do not contain any color information. Color information is a factor that creates differences both among objects in the same class and among different objects in different classes in the dataset. Therefore, it is a factor that enables DL-based SR models to learn different features. The intensity values partly create this difference in grayscale images. The dataset was enriched by using the widest possible range of intensity values for the included images. For example, although most of the forested areas have low intensity values, those with slightly higher intensity values due to differences in illumination conditions were included in the dataset to obtain training data that would be differently identified by the model. Similarly, although farmland areas mostly have high intensity values, farmland areas with low intensity values were included in the training through the applied methodology.
The final constraint is the insufficient information that can be learned from HAPs in terms of HR-LR image pairs. Details of residential areas are already scarce in older images depending on when the data was acquired. Similarly, details of farmland areas are scarce in newer images. Higher-dimensional images than those used in the datasets in the literature were used during the training process to facilitate learning by the model. As the content of the images is already limited, convolution operations would have been performed on fewer details with the use of smaller images. Increasing the area that kernels move with higher image sizes allowed for more information to be gathered.
With the proposed minimization approaches for dataset content, much higher-quality output images were obtained. Positive progress has been observed when both image quality metrics and visual interpretation are evaluated together.
In the proposed approach for the structure of the dataset, it is suggested that the photographs created depending on the content-based approach be categorized according to the hierarchy of PIEs. Lightweight SR models struggle to cope with datasets containing a large number of images due to their fewer parameters. For a DL-based SR model to learn adequately, there must be enough images in the dataset; otherwise, the model may tend to overfit. In this context, instead of training all images
57
in different classes at once, it has been suggested that images in these classes should be categorized according to the hierarchy of photo interpretation elements and trained independently of each other. Images belonging to bare land and forested areas are included in the primary group where tone and color information is present. Residential areas are included in the secondary group where shape, size, and texture are present. Agricultural areas are included in the tertiary group where patterns are present.
The test image for different training weights obtained in the structural-based approach was improved separately. At this point, the question is how to obtain better quality parts of three different improved images of each test image. For this purpose, BRISQUE, a non-reference quality metric, was used. The BRISQUE value was calculated for the equal parts obtained starting from the upper left corner of each enhanced image. Since lower BRISQUE values represent higher quality, those with lower values among the three images were selected. The final image was obtained by concatenating all selected parts. The images obtained in this way were indistinguishable from the content-based approach in terms of visual perception, but were not better in terms of image quality metrics.
HAPs are image data obtained in the past. They play a crucial role in problems associated with the past in today's context. Therefore, when used in legal matters or academic studies where temporal changes are detected, their resolutions may be inadequate. However, their usage after enhancement with DL-based SR techniques will enhance the accuracy and reliability of studies.
To achieve better results in enhancing HAPs with SR techniques, wider resolution images over a broader range of years can be used in lighter models. This will positively impact the quality of the dataset, enabling the model to perform better learning. In particular, increasing image diversity can further advance the results obtained with the structural approach.
In SR applications, the amount of improvement in synthetically produced test images using interpolation cannot be achieved in test images that come from an external source and are directly enhanced. Therefore, the proposed approaches require other image processing techniques other than increasing the richness of the images in the dataset. In this context, adding color information, which does not currently exist, to HAPs using DL techniques will be an important step. Color information is one of the
58
most significant deficiencies for HAPs. Therefore, the coloring process will make the imitation of multi-spectral data more powerful.
59
REFERENCES
Agustsson, E., and Timofte, R. (2017). NTIRE 2017 challenge on single image super-resolution: Dataset and study, Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July.
Aherne, F. J., Thacker, N. A., and Rockett, P. I. (1998). The Bhattacharyya metric as an absolute similarity measure for frequency coded data, Kybernetika, 34(4), 363-368.
Ahn, N., Kang, B., and Sohn, K. A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual network. Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8-14 September.
Al-Mekhlafi, H. and Liu, S. (2024). Single image super-resolution: a comprehensive review and recent insight. Frontiers of Computer Science, 18, 181702.
Anwar, S., Khan, S., and Barnes, N. (2020). A deep journey into super-resolution: A survey. ACM Computing Surveys, 53(3), 1–34.
Aplin, P., Atkinson, P.M., and Curran, P.J. (1997). Fine spatial resolution satellite sensors for the next decade, International Journal of Remote Sensing, 18, 3873-3881.
Bakirman, T., Bayram, B., Akpinar, B., Karabulut, M. F., Bayrak, O. C., Yigitoglu, A., and Seker, D. Z. (2020). Implementation of ultra-light UAV systems for cultural heritage documentation, Journal of Cultural Heritage, 44, 174-184.
Bevilacqua, M., Roumy, A., Guillemot, C., and Mla, M. (2012). Low-complexity single-image super resolution based on nonnegative neighbor embedding, Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September..
Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., and Shi, W. (2017). Real-time video super-resolution with spatio-temporal networks and motion compensation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July.
Chauhan, K., Patel, S.N., Kumhar, M., Bhatia, J., Tanwar, S., Davidson, I.E., Mazibuko, T.F., and Sharma, R. (2023). Deep Learning-Based Single-Image Super-Resolution: A Comprehensive Review, IEEE Access, 11, 21811-21830.
Chen, H., He, X., Qing, L., Wu, Y., Ren, C., Sheriff, R. E., and Zhu, C. (2022). Real-world single image super-resolution: A brief review, Information Fusion, 79, 124-145.
60
Cheng, G., Han, J., and Lu, X. (2017). Remote Sensing Image Scene Classification: Benchmark and State of the Art, Proceeding of the IEEE, 105(10), 1865–1883.
Cooper, M.A.R., and Robson, S. (1994). A hierarchy of photogrammetric records for archaeology and architectural history, ISPRS Journal of Photogrammetry and Remote Sensing, 49(5), 31-37.
Dai, D., and Yang, W. (2010). Satellite image classification via two-layer sparse coding with biased image representation, IEEE Geosci. Remote Sens. Lett., 8, 173–176.
Dai, T., Cai, J., Zhang, Y., Xia, S.T., and Zhang, L. (2019). Second-order attention network for single image super-resolution. Proceedings of the IEEE Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June.
Deeba, F., Dharejo, F.A., Zhou, Y., Ghaffar, A., Memon, M.H., and Kun, S. (2020). Single image super-resolution with application to remote-sensing image. Proceedings of Global Conference on Wireless and Optical Technologies, Malaga, Spain, 6-8 October.
Dixit, M., and Yadav, R.N. (2023). A Review of Single Image Super Resolution Techniques using Convolutional Neural Networks, Multimed Tools Applications, 83, 29741–29775..
Dong, C., Loy, C.C., and Tang, X. (2016). Accelerating the super-resolution convolutional neural network, Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October.
Dong, C., Loy, C.C., He, K., and Tang, X. (2014). Learning a deep convolutional network for image super-resolution. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September.
Doyle, F. (1964). The historical development of analytical photogrammetry, Photogrammetric Engineering, 30(2), 259-265.
Farsiu, S., Robinson, D., Elad, M., and Milanfar, P. (2004). Advances and challenges in super‐resolution, International Journal of Imaging Systems and Technology, 14(2), 47-57.
Fujimoto, A., Ogawa, T., Yamamoto, K., Matsui, Y., Yamasaki, T., and Aizawa, K. (2016). Manga109 dataset and creation of metadata, Proceedings of the 1st International Workshop on Comics Analysis, Processing and Understanding, Cancun, Mexico, 4 December.
Gates, J. W. C. (1984). Photogrammetry in science and technology. Photogrammetric Record, 11(63), 297-279.
Gendy, G., He, G., and Sabor, N. (2023). Lightweight image super-resolution based on deep learning: State-of-the-art and future directions. Information Fusion, 94, 284-310.
Girod, B. (1993). What’s wrong with mean-squared error? In Digital Images and Human Vision; MIT Press: London, UK, 1993; pp. 207–220.
61
Goudail, F., Réfrégier, P., and Delyon, G. (2004). Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images, Journal of the Optical Society of America, 21, 1231–1240.
Greeshma, M.S., and Bindu, V.R. (2020). Super-resolution quality criterion (SRQC): A super-resolution image quality assessment metric, Multimedia Tools and Applications, 79, 35125–35146.
Guo, J., Sun, X., Zhang, Y., Fu, K., and Wang, L. (2019). Deep residual squeeze and excitation network for remote sensing image super-resolution, Remote Sensing, 11, 1–16.
Guo, M., Zhang, Z., Liu, H., and Huang, Y. (2022). NDSRGAN: a novel dense generative adversarial network for real aerial imagery super-resolution reconstruction, Remote Sensing, 14, 1574.
Ha, V.K., Ren, J.C., Xu, X.Y., Zhao, S., Xie, G., Masero, V., and Hussain A. (2019). Deep learning based single image super-resolution: A survey, International Journal of Automation and Computing, 16, 413–426.
Hao X, Liu L, Yang R, Yin L, Zhang L, and Li X. (2023). A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition, Remote Sensing, 15(3),827.
Haris, M., Shakhnarovich, G., and Ukita, N. (2018). Deep Back-Projection Networks For Super-Resolution, Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June.
Haut, J.M., Fernandez-beltran, R., Paoletti, M.E., Plaza, J., and Plaza, A. (2019). Remote sensing image superresolution using deep residual channel attention, IEEE Trans. Geosci. Remote Sensing, 57, 9277–9289.
Huang, J.B., Singh, A., and Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June.
Huang, N., Yang, Y., Liu, J., Gu, X., Cai, H. (2017). Single-image super-resolution for remote sensing data using deep residual-learning neural network, Processing of International Conference on Neural Information, Guangzhou, China, 14-18 November.
J. J. Danker Khoo, K. H. Lim., and J. T. Sien Phang. (2020). A Review on Deep Learning Super Resolution Techniques," 2020 IEEE 8th Conference on Systems, Process and Control (ICSPC), Melaka, Malaysia, 2020, pp. 134-139.
Jiang, K., Wang, Z., Yi, P., Jiang, J., Xiao, J., and Yao, Y., (2018). Deep distillation recursive network for remote sensing imagery super-resolution. Remote Sensing, 10, 1–23.
Jiang, K., Wang, Z., Yi, P., Wang, G., Lu, T., and Jiang, J. (2019). Edge-enhanced GAN for remote sensing image superresolution, IEEE Transactions on Geoscience and Remote Sensing, 57, 5799–5812.
62
Keshk, H.M., Abdel-Aziem, M., Ali, A.S., & Assal, M.A. (2014). Performance evaluation of quality measurement for super-resolution satellite images, Proceedings of the Science and Information Conference, London, UK, 27–29 August.
Khan, R., Sablatnig, R., Bais, A., and Khawaja, Y. M. (2011). Comparison of reconstruction and example-based super-resolution. Proceedings of the 7th International Conference on Emerging Technologies, Islamabad, Pakistan, 5-6 September.
Khoo, J. J. D., Lim, K. H., and Phang, J. T. S. (2020) . A review on deep learning super resolution techniques, Proceedings of the IEEE 8th Conference on Systems, Process and Control (ICSPC), Melaka, Malaysia, 11-12 December.
Kim, J., Lee, J.K., and Lee, K.M. (2016a). Accurate image super-resolution using very deep convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June.
Kim, J., Lee, J.K., and Lee, K.M. (2016b). Deeply-recursive convolutional network for image super-resolution, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June.
Kim, S., Jun, D., Kim, B.-G., Lee, H., and Rhee, E. (2021). Single Image Super-Resolution Method Using CNN-Based Lightweight Neural Networks, Applied Sciences., 11, 1092.
Kovaniˇc, L’., Topitzer, B., Pet’ovský, P., Blišt’an, P., Gergel’ová, M.B., and Blišt’anová, M. (2023). Review of Photogrammetric and Lidar Applications of UAV, Appied. Sciences, 13, 6732.
Lanaras, C., Bioucas-dias, J., Galliani, S., and Baltsavias, E. (2018). Super-resolution of Sentinel-2 images : learning a globally applicable deep neural network, ISPRS Journal of Photogrammetry and Remote Sensing, 146, 305–319.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., …... and Wang, Z. (2017). Photo-realistic single image super-resolution using a generative adversarial network, Proceedings of the IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July.
Lei, S., Shi, Z., and Zou, Z. (2017). Super-resolution for remote sensing images via local-global combined network. IEEE Geosci. Remote Sens. Lett. 14, 1243–1247. https://doi.org/ 10.1109/LGRS.2017.2704122.
Li, J., Fang, F., Mei, K., & Zhang, G. (2018). Multi-scale residual network for image super-resolution. In Proceedings of the European conference on computer vision (ECCV), 517-532.
Li, K., Yang, S., Dong, R., Wang, X., and Huang, J. (2020). Survey of single image super-resolution reconstruction. IET Image Processing., 14, 2273–2290.
63
Liebel, L., and Korner, M. (2016). Single-image super resolution for multispectral remote sensing data using convolutional neural networks, ISPRS International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 41, 883–890.
Lillesand, T., Kiefer, R. W., and Chipman, J. (2015). Remote sensing and image interpretation. United States of America.
Lim, B., Son, S., Kim, H., Nah, S., and Lee, K.M. (2017). Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July.
Liu, B., Li, H., Zhou, Y., Peng, Y., Elazab, A., and Wang, C. (2020). A super resolution method for remote sensing images based on cascaded conditional wasserstein GANs, Proceedings of 3rd IEEE International Conference on Information Communication and Signal Processing, ICICSP, Shanghai, China, 12-15 September.
Ludwig, M., M. Runge, C., Friess, N., Koch, T. L., Richter, S., Seyfried, S., ... and Nauss, T. (2020). Quality assessment of photogrammetric methods—A workflow for reproducible UAS orthomosaics, Remote Sensing, 12(22), 3831.
Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 9–12 July.
Medda, A., and DeBrunner, V. (2006). Color image quality index based on the UIQI, Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, Denver, CO, USA, 26–28 March.
Mikamoto, Y., Kaminaka, Y., Higaki, T., Raytchev, B., and Kaneda, K. (2023). Spectral Super-Resolution for High Dynamic Range Images. Journal of Imaging, 9, 83.
Mittal, A., Soundararajan, R., and Bovik, A.C., (2013). Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20, 209–212.
Nasrollahi, K., and Moeslund, T. B. (2014). Super-resolution: a comprehensive survey, Machine Vision and Applications, 25, 1423-1468.
Ooi, Y.K., and Ibrahim, H. (2021). Deep learning algorithms for single image super-resolution: A systematic review, Electronics, 10, 867.
Park, S. C., Park, M. K., and Kang, M. G. (2003). Super-resolution image reconstruction: a technical overview. IEEE Signal Processing Magazine, 20(3), 21-36.
Q. Chen, Z. Qiang., and H. Lin. (2022). A Review of Super Resolution Based on Deep Learning, Proceedings of the IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9-12 December.
64
Ren, C., He, X., Qing, L., Wu, Y., and Pu, Y. (2021). Remote sensing image recovery via enhanced residual learning and dual-luminance scheme, Knowledge-Based Systems., 222, 107013.
Shermeyer, J., and Van Etten, A. (2019). The effects of super-resolution on object detection performance in satellite imagery, Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16-17 June.
Singla, K., Pandey, R., and Ghanekar, U. (2022). A review on Single Image Super Resolution techniques using generative adversarial network, Optik, 266, 169607.
Siu, W. C., and Hung, K. W. (2012). Review of image interpolation and super-resolution, Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Hollywood, CA, USA, 3-6 December.
Štroner, M., Urban, R., Seidl, J., Reindl, T., and Brouček, J. (2021). Photogrammetry using UAV-mounted GNSS RTK: Georeferencing strategies without GCPs, Remote Sensing, 13(7), 1336.
Sun, L., Liu, Z., Sun, X., Liu, L., Lan, R., and Luo, X. (2021). Lightweight image super-resolution via weighted multi-scale residual network, IEEE/CAA Journal of Automatica Sinica, 8(7), 1271-1280.
Tai, Y., Yang, J., and Liu, X. (2017a). Image super-resolution via deep recursive residual network, Proceedings of the IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July.
Tai, Y., Yang, J., Liu, X., and Xu, C. (2017b). MemNet: A Persistent Memory Network for Image Restoration, IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22-29 October.
Tong, T., Li, G., Liu, X., and Gao, Q. (2017). Image super-resolution using dense skip connections, Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22-29 October.
Truong, N.T., Vo, N.D., and Nguyen, K. (2020). The effects of super-resolution on object detection performance in an aerial image, Proceedings of 7th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam, 26-27 November.
Tuna, C., Unal, G., and Sertel, E. (2018). Single-frame super resolution of remote-sensing images by convolutional neural networks, International Journal of Remote Sensing, 39, 2463–2479.
Url-1: <http://geodata.vermont.gov>, date retrieved 10.01.2024.
Url-2: <http:// gis.stackexchange.com/questions/397523/> date retrieved 17.11.2023.
Van Reeth, E., Tham, I. W., Tan, C. H., and Poh, C. L. (2012). Super‐resolution in magnetic resonance imaging: a review. Concepts in Magnetic Resonance Part A, 40(6), 306-325.
Wagner, L., Liebel, L., and Korner, M. (2019). Deep residual learning for single-image super- resolution of multi-spectral satellite imagery, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4, 189–196.
65
Wang, P., and Sertel, E. (2021a). Channel–spatial attention-based pan-sharpening of very high- resolution satellite images. Knowledge-Based Systems, 229, 107324.
Wang, P., Bayram, B., and Sertel, E. (2022b). A comprehensive review on deep learning based remote sensing image super-resolution methods, Earth-Science Reviews, 232, 104110.
Wang, P., Bayram, B., Sertel, E. (2021). Super-resolution of remotely sensed data using channel attention based deep learning approach, International Journal of Remote Sensing, 42, 6050-6067.
Wang, X., Yi, J., Guo, J., Song, Y., Lyu, J., Xu, J., ……. and Min, H. (2022a). A review of image super-resolution approaches based on deep learning and applications in remote sensing. Remote Sensing, 14, 5423.
Wang, X., Yu, K., Dong, C., & Loy, C.C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 606–615.
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., & Loy, C.C. (2018). ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018
Wang, Z., Bovik, A., Sheikh, H.R., and Simoncelli, E.P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600–612.
Wang, Z., Chen, J., and Hoi, S.C.H. (2020). Deep learning for image super-resolution: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3365–3387.
Wei, W., Yongbin, J., Yanhong, L., Ji, L., Xin, W., and Tong, Z. (2019). An advanced deep residual dense network (DRDN) approach for image super-resolution, International Journal of Computational Intelligence Systems, 12, 1592–1601.
Xia, G.S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., and Lu, X. (2017). AID: A benchmark data set for performance evaluation of aerial scene classification, IEEE Transactions on Geoscience and Remote Sensing, 55, 3965–3981.
Yang, J., Wright, J., Huang, T.S., and Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19, 2861–2873.
Yang, Y., and Newsam, S. (2010). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 3–5 November.
Yu, M., Shi, J., Xue, C., Hao, X., and Yan, G. (2023). A review of single image super-resolution reconstruction based on deep learning. Multimed Tools Applications, 83, 55921–55962.
66
Zeyde, R., Elad, M., and Protter, M. (2010). On single image scale-up using sparse-representations, Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June.
Zhang, J., Xu, T., Li, J., Jiang, S., and Zhang, Y. (2022). Single-Image Super Resolution of Remote Sensing Images with Real-World Degradation Modeling., Remote Sensing, 2022, 14, 2895.
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., and Fu, Y. (2018b). Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018.
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018a). Residual dense network for image super resolution, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June.
Zhao, L., Tang, P., and Huo, L. (2016). Feature significance-based multibag-of-visual-words model for remote sensing image scene classification. Journal of Appied Remote Sensing, 10, 035004.
Zhou, F., Yang, W., and Liao, Q. (2012). Interpolation-based image super-resolution using multisurface fitting, IEEE Transactions on Image Processing, 21(7), 3312-3318.
Zhou, W., and Bovik, A.C. (2002). A universal image quality index. Signal Processing Letters, 9(3), 81-84.
Zhu, H., Xie, C., Fei, Y., and Tao, H. (2021). Attention Mechanisms in CNN-Based Single Image Super-Resolution: A Brief Review and a New Perspective, Electronics, 10, 1187.
Zou, Q., Ni, L., Zhang, T., and Wang, Q. (2015). Deep learning based feature selection for remote sensing scene classification, IEEE Geoscience and Remote Sensing Letters, 12, 2321–2325.
67
CURRICULUM VITAE
Name Surname : Abdullah Harun INCEKARA
EDUCATION :

B.Sc. : 2016, Istanbul Technical University, Faculty of Civil Engineering, Department of Geomatics Engineering

M.Sc. : 2018, Istanbul Technical University, Graduate School, Geomatics Engineering Programme
PUBLICATIONS, PRESENTATIONS AND PATENTS ON THE THESIS:

Incekara, A.H., and Seker, D.Z. (2024). Improving the Quality of Grayscale Historical Images With Super-Resolution Technique. AICMES 5th International Conference On Current Scientific Studies, Amman, 14-17 March.

Incekara, A.H., Alganci, U., Arslan, O., and Seker, D.Z. (2024). Minimizing the Limitations in Improving Historical Aerial Photographs With Super-Resolution Technique, Applied Sciences, 14(4), 1495. Doi:10.3390/app14041495
OTHER PUBLICATIONS, PRESENTATIONS AND PATENTS:
Incekara, A.H., Yaprak Basaran, E., Seker, D.Z. (2022). The Potential Contribution of Remotely Sensed Images For Greenmetric Ranking, International Journal Of Environment And Geoinformatics, 9(4), 138-150., Doi: 10.30897/ijegeo.1141366
Incekara, A.H., Seker, D.Z. (2021). Rolling Shutter Effect On The Accuracy Of Photogrammetric Product Produced By Low-Cost Uav, International Journal Of Environment And Geoinformatics, 8(4), 549-553.
Incekara, A.H., Delen, A., Seker, D.Z., Goksel, C. (2019). Investigating the Utility Potential of Low-Cost Unmanned Aerial Vehicles in the Temporal Monitoring of a Landfill. ISPRS International Journal of Geo-Information. 8(1):22.
Incekara, A.H., Delen, A., Seker, D.Z., Balik Şanlı, F., Susam, T. (2019). "Using Satellite Imageries and Orthophoto to Quantify Environmental Impact of Mining Activities in Forest Area", Fresenius Environmental Bulletin, 28(2), 806-812.
68
Incekara, A. H., Seker, D. Z., Bayram, B. (2018). Qualifying the LIDAR-derived intensity image as an infrared band in NDWI-based shoreline extraction. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(12), 5053-5062.
Incekara, A. H., Seker, D. Z. (2018). Comparative analyses of the point cloud produced by using close-range photogrammetry and terrestrial laser scanning for rock surface. Journal of the Indian Society of Remote Sensing, 46, 1243-1253.

Hiç yorum yok:

Yorum Gönder