Journal of Image and Graphics

CLDS-YOLO: Corn Leaf Disease Detection and Severity Evaluation Using YOLOv9

Elmo Ranolo — 2025-08-08

Corn is one of the main crops around the world, especially in the Philippines, where agriculture is threatened by diseases such as Northern Corn Leaf Blight, Gray Leaf Spot, and Corn Rust. Tropical cyclones and environmental variables worsen these illnesses, resulting in dramatically lower crop yields and quality. This study introduces CLDS-YOLO (Corn Leaf Disease Detection and Severity Evaluation Using YOLOv9), which utilizes YOLOv9 instance segmentation for high-precision disease detection and integrates fuzzy logic for severity evaluation with relative leaf area (RLA) computation and the number of diseased regions. YOLOv9e-seg achieved near-perfect precision for leaf detection (approximately 1.0), but showed challenges in identifying diseased regions, with lower precision ~0.7 and recall ~0.3 for segmentation tasks. Despite these limitations, the model's boxing tasks demonstrated higher performance, reflected in mAP@0.5 and F1-scores. Severity analysis successfully classified disease levels, facilitating effective crop management. This study highlights the potential of CLDS-YOLO in real-time disease detection and severity evaluation, offering a foundation for an indoor planting system that ensures consistent crop health monitoring and protection from adverse weather conditions.

WGEF: An Optimized Deep Learning Model for Recognition of Road Surface Condition Using SVM Classifier

RAMYA KRISHNA RAJAVOLU — 2025-08-08

Road surface often plays a crucial role in day-to-day scenario and a major source of transportation. The dependency of public, government, and other applications on road is major issue. The roads frequently get damaged due to the weather conditions. Different weather conditions play different role on the road surface. Identification of road conditions is getting difficult. To overcome the problem, in this paper proposed a deep learning (DL), Optimization and machine learning (ML) model and combine both the models for recognition and classification of Road surface conditions (RSC). The model is designed on publicly available dataset. The images in the dataset are processed using wavelet transform (W) and extracted GLCM (G) data points. The detailed feature extraction is performed using Efficient-Net (E) model. The obtained data points are optimized using firefly optimization (F) and finally classified using support vector machine (SVM). The identification of RSC is performed using low number of data points and achieved best result. The combination of DLML provided better results in term of metric evaluated like sensitivity, specificity, precision, and accuracy. The rate of accuracy achieved using the proposed design is 98.15%.

Performance Evaluation of Different Optimizers on Alzheimer’s Disease Classification using a Customized Convolutional Neural Network

Pallavi Saikia — 2025-07-21

Alzheimer's disease is a type of dementia that usually affects elderly people. It is a neurological disorder that causes a patient to lose memory gradually over time. The brain of an Alzheimer disease (AD) patient shrinks due to the accumulation of amyloid plaques in between the neurons. As a result, the neurons, which are the basic building blocks of the brain, lose connections and cannot communicate with each other. A person can be prevented from having AD if diagnosed at the right time. So it’s very important to detect patients with mild symptoms of dementia in order to save them from getting AD. In this work, we have proposed a customized CNN model for classifying Alzheimer’s disease. The model has been evaluated with two benchmark datasets, the Kaggle Alzheimer’s dataset and the ADNI dataset. The two datasets differ in the number of images. The K-fold technique has been applied to overcome the problem of class imbalance. We have updated the model parameters using optimizers, namely SGD, SGD with momentum, AdaGrad, AdaDelta, RMSprop, and Adam. Experimental results established that the proposed model outperforms many of the state-of-the-art models, considering the two benchmark datasets.

A Comparative Study of Convolutional Neural Networks (CNN) Architectures on Microscopic Blood Film Images for Malaria Diagnosis

Chikodili Ugwuishiwu — 2025-07-21

Malaria has been recorded as one of the deadliest diseases globally. Accurate diagnosis is essential for suitable treatment, and the traditional practice of malaria diagnosis has proved inefficient as results depend on the skills of the health personnel. In recent times, deep learning models have proved helpful in the quick detection of malarial parasites. This research focused on developing a classification model of Convolutional Neural Networks (CNN) architectures and comparing these models to identify the most effective one for automatic malaria parasite detection on thin blood smear images. A dataset of 27,558 digital blood images was collected from the National Institutes of Health (NIH) database in Bangkok, Thailand. The dataset was categorized into parasitized and uninfected cells and was fragmented into training (80%) and validation (20%) sets. Performance metrics for measuring the model's performance include sensitivity, specificity, precision, and F1 Score. The model predicted and classified thin blood smear digital images as either parasitized or uninfected with custom InceptionV3 outperforming the VGG19 and custom CNN with an accuracy of 89.85%. The result shows that malaria diagnosis on microscopic thin blood images using deep learning can potentially improve early detection of malaria parasites, which could prevent deaths, reduce the workload of parasitologists, and eliminate other limitations of the traditional malaria diagnostic approaches.

Robotic Arm: Cutting Edge Sorting Machine for Industrial Optimization

Chiranjeevi Karri — 2025-08-08

This paper introduces an innovative Sorting Robotic Arm (SRA) utilizing a SCARA (Selective Compliance Assembly Robot Arm) model for hardware and imposing the advanced object detection capabilities of the Grounding Dino model, enabling real-time identification and sorting of objects with exceptional accuracy and speed. The modular design of this arm facilitates easy customization and scalability, allowing for seamless deployment in various industrial settings. By harnessing the capabilities of SCARA robotics and advanced AI. This paper aims to revolutionize sorting processes, improving productivity, accuracy, and efficiency in industrial operations including recycling facilities, logistics and warehousing, seed sorting in agriculture.

Vision Transformer for the Categorization of Breast Cancer from H&E Histopathology Images

Md Shakhawat Hossain — 2025-08-08

Breast cancer (BC) is the most frequent form of cancer, accounting for 24.5% of all cancer cases worldwide, with projections estimating 364,000 cases by 2040.Accurate diagnosis and effective categorization of BC are essential for proper treatment planning, patient management and improved survival.Traditionally, pathologists examine histopathology specimens manually using a microscope to categorize the BC, which is labor-intensive, time-consuming, prone to subjectivity, and constrained by the availability of experts. An automated approach can address these limitations, but previous automated methods failed to achieve sufficient accuracy and reliability. This study presents an automated BC categorization method leveraging whole slide histopathology images and a transformer-based deep learning model. The proposed method uses a cascade of transformers to classify BC using 40X histopathology images, following the taxonomy defined by the BRACS dataset, distinguishing between benign, atypical, and malignant cases. First, it classifies BC into three primary categories—benign, atypical and malignant—and subsequently determines the specific sub-types within each category. The proposed method was validated using two widely recognized datasets: BRACS and BreakHis. On BRACS, it achieved 95.6% accuracy in classifying BC into benign, atypical, and malignant categories, with sub-type accuracies of 94.7% for benign, 98.6% for atypical, and 99.1% for malignant cases. On the BreakHis dataset, the model achieved 93% accuracy for binary benign-malignant classification, with sub-type accuracies of 94% and 91% for benign and malignant cases, respectively. Proposed method outperformed existing methods in accuracy and robustness, making it a promising tool for automated BC diagnosis and classification.

A Novel Deep Learning Approach for Speckle Denoising using Hyper-parameter Tuning

Nilima — 2025-06-25

Ultrasound imaging is one of the key noninvasive diagnostic methods used in medicine today. Many of the deep learning (DL) speckle denoising algorithms, in particular Autoencoder models and Convolutional Neural Network (CNN) based techniques, tend to be overfit, have low accuracy, or even perform badly on different sets of data. To help tackle these problems, the study suggests a new Unet-Elu CNN architecture that incorporates an Exponential Linear Unit (ELU) as its activation function. ELU is also used to endow the model with non-linearity while facilitating the flow of gradients within the model. The batch normalization and dropout layers are added with focus on improving accuracy and preventing overfitting. The proposed framework is evaluated in two stages. In stage 1 the proposed framework is compared with fine-tuned state-of-the-art UNet, UNet_ReLU, CNN Autoencoder and other filtering methods. For stage 2 comparative analysis transfer learning models are optimized and compared. The proposed framework performs without any sign of performance degradation and overfitting when tested on different datasets. This model was evaluated using the evaluation metrics of PSNR, SSIM, and MSE with different levels of speckle noise in order to determine the effectiveness of these techniques. It was able to achieve a PSNR of 37.76 dB and SSIM 98% for the Unet-Elu CNN model which indicates a strong denoising performance. The optimized adjustment to the architecture and ELU activation function of the Unet-Elu CNN model marks a significant improvement in ultrasound image denoising.

Dual-stream generative network based staining transfer for biomarker in breast cancer

Ziyang Jin — 2025-08-08

Pathological examination is a crucial standard in cancer diagnosis, with breast cancer being one of the leading causes of
morbidity and mortality in recent years, posing a major threat to health. Enhancing pathological examination capabilities has become an important way to save lives and improve patients' quality of life. Common pathological examination methods include Hematoxylin and Eosin (H&E) and Immunohistochemistry (IHC) staining. H&E-stained images alone are often insufficient for cancer diagnosis, while IHC provides more comprehensive information for confirmed diagnosis. To address the challenges of limited IHC resources and high-cost consumption, we aim to generate virtual IHC images from H&E-stained images. In practice, it is difficult to perform multiple stains on the same tissue section, making it hard to obtain pixel-level matched data. To overcome this, we propose a dual-stream generative network that leverages pathological consistency constraints and a pathological representation network to extract pathological information and improve prediction accuracy. The network also incorporates structural similarity constraints and skip connections to enhance structural similarity. Additionally, we use stain unmixing results as annotated data, significantly reducing the workload of pathologists. Extensive experiments demonstrate that our method exhibits superior stability and performance compared to existing approaches.

Dynamic Attention for Enhancement of Weak Contrast Images using Advance ASENet

Jallu Harika — 2025-07-21

Low-light photography frequently results in photos with noise and inadequate brightness. Therefore, one of the most difficult tasks in computer vision is improving images in low light. Numerous techniques have been put forth for this purpose, however they frequently exacerbated the underlying noise in the input image and failed in extremely low light conditions. This research proposes the Advanced Attention-Shift Enhancement Network (Adv-ASENet), a promising and innovative techniques for effectively improving weak contrast low light images (WCLL), as a solution to this challenging challenge. By using an attention mechanism to selectively focus on the most significant details in a low-light image, this deep learning technique enables the model to prioritise enhancing certain regions while minimising noise amplification in well-lit areas, producing an enhancement result that is more balanced and inherent. Because poor contrast photos frequently need targeted improvement in certain places where contrast is absent, including low light areas, shadows, or edges, this method can be very successful. Localised problems, including poor contrast in certain regions, are common in weak contrast photos. These regions can be selectively enhanced by ASENet's dynamic attention blocks without overprocessing sections that are well-contrasted. The spatial attention module preserves well-exposed sections of the picture while localising enhancement to areas that need contrast boosting. According to the experimental results, the suggested WCLL image enhancement network maintains an appropriate level of network complexity while producing adequate results when compared to current techniques in terms of metrics like SSIM, PSNR, MSE, Entropy, etc.

Analysis Evolution of Image Caption Techniques: Combining Conventional and Modern Methods for Improvement

Nuha kh. — 2025-08-08

This research aims to study the challenges facing artificial intelligence in the field of generating image captions, a complex process that requires effective integration between computer vision algorithms and natural language processing models to generate accurate and understandable sentences that correctly reflect the image content. Two main approaches are analyzed: traditional methods such as retrieval and fixed templates, and modern frameworks that include encoder-decoder, attention mechanisms, and advanced training techniques with their effectiveness and accuracy evaluated. The results show that both conventional and modern methods have their advantages, as modern methods have outperformed and contributed to significantly improving the quality of annotations, and opportunities for integration between the two methods can be achieved by combining them, which enhances the possibility of improving the performance of models in general. The study recommends further research to develop strategies that overcome the current limitations and improve performance.

Integrating Spatial Pyramid Pooling for Multi-Scale Brain Tumor Classification in Deep Learning

Karrar Kadhim — 2025-07-21

Deep learning has transformed medical image analysis; in particular, many different brain tumors are being produced. Accurate detection is crucial for effective treatment and contributes to prolonging the life of patients. While MRI is a standard diagnostic tool, manual interpretation can be slow and sometimes error prone. As such, automatic classification systems based on Convolutional Neural Networks (CNNs) are gaining importance. However, standard CNNs are generally good for capturing only the continuous features within one region of the data, and that means they need huge amounts of training samples to work well in practice. We also struggle to capture diverse shapes, sizes and positions of brain tumors. To meet this challenge, our paper proposes the SPP-MobileNet model, which integrates a Spatial Pyramid Pooling (SPP) block into the MobileNet architecture. With the SPP layer for multi-scale feature extraction, our classifier is much better at spotting tumor appearance changes without resizing images. By building this into MobileNet, SPP-MobileNet maintains all that model's computational efficiency while boosting classification accuracy. On two MRI datasets, the proposed model outperformed other state-of-the-art methods with an accuracy of 98.86% and perfect precision. Its recall rate was 97.68%, while the Matthews Correlation Coefficient value reached 97.75 %. These results suggest that SPP-MobileNet is a powerful tool for brain tumor classification, and it should go some way to improving diagnostic accuracy and speed. In the future, we will focus on tuning the model for more complex types of brain tumors and applying it across various other medical imaging tasks.

Development of preprocessing stage for early cervical cancer detection using UNET Diffusion model

Parimala — 2025-05-20

Colposcopy, a medical procedure for examining the cervix, is often used for diagnosing cervical cancer. It generates medical images often containing bright regions known as specular reflections (SR). These bright areas, caused by factors such as moisture and device lighting, can affect further image analysis steps like feature extraction and classification.

We employed a two-stage process, detecting the specular reflection and inpainting the detected regions in the colposcopy images. The first stage creates SR region masks using local thresholding and white top-hat morphological operations on augmented and pre-processed images. The second stage employs Stable Diffusion v1-5, a UNET diffusion model which we additionally train on our dataset of colposcopy images using Dreambooth to inpaint the SR regions detected in the first stage. The inpainting workflow is implemented using the ComfyUI platform, chosen for its flexibility in customizing diffusion pipelines. We utilized the Kaggle dataset to train and evaluate our model.

Our trained UNET diffusion model achieved reconstruction with a minimal loss of 0.307 after 800 optimization steps and an average stable Structural Similarity Index Measure (SSIM) of 0.69. This work represents a pre-processing step towards improving colposcopy image quality, thus enhancing the accuracy of cervical cancer diagnosis.

Recognition of Objects Using Fast RCCN- Hybrid Particle Swarm Firefly Algorithm

Narasimha Rao Yamarthi — 2025-06-25

-In the process of image analysis, the recognition and detection of objects is very important in big data analysis. In this paper, a fast and accurate detection of object is designed by utilizing faster recurrent neural network (F-RCNN) and hybrid optimization. Initially, the input dataset is processed and perform segmentation operation using K-Mean clustering. The FRCNN model helps in extracting the features of the image. The detailed information of pixels in the image will be retained using FRCNN. To improve the process of neural network, a hybrid optimization technique i.e., particle swarm optimization and firefly optimization (PSFA) is combined. The model is designed by integrating FRCNN and PSFA which improves the detection process in big data analysis. In this paper, the proposed model is evaluated using PASCAL VOC 2007 dataset. Objects like car, ship, cat, dog, horse, person etc are focussed and recognition of the objects in images are done from different class of objects. The integration of optimization with deep learning method gives a promising improvement in obtaining the objects of various classes. The mean average precision in detection of objects is evaluated using matlab software tool. The overall average precision rate achieved is 84.6% which is higher compared to other techniques

Enhancing Smart Wheelchair Navigation through Head Motion Detection: A YOLO-based Approach

Fitri Utaminingrum — 2025-07-21

In 2020, it was recorded that 5 percent of the total population in Indonesia were people with disabilities. People with physical disabilities, especially those who cannot function both their hands and feet are facing problem in their mobility. When most wheelchairs are controlled using hands, they will not be able to control the wheelchair by themselves. This research aims to create smart wheelchairs that use object detection models, to enable the user to navigate their wheelchair. The smart wheelchair is equipped with a camera that will capture the user's head movement and will move based on it. Deep learning model algorithms are used to detect the head movement. In this research, each variation of three generations of YOLO (You Only Look Once), YOLOv5, VOLOv6, and YOLOv7, are compared to find the proper one for the system. It is found that YOLOv6N has the fastest inference time, that is 2.54 ms. All the models are also evaluated on several parameters: Precision, recall, mAP@.5, and mAP@.5:.95. There’s no huge difference between each variation. All of the precision, recall and mAP@.5 of each variation are above 0.9. Yet, the difference can be seen for the mAP@.5:.95 where the highest score is 0.808 from YOLOv6L and the lowest is 0.703 from YOLOv5N.

Achieving High-End Image Localization via Causality Infused Renet50 model

Chaitanya — 2025-04-25

The work involved a methodical approach to develop and assess a machine learning model for image localization. We have adapted a pre-trained ResNet50 model integrated with causality to develop high quality predictions for image localization. The model at its core utilizes optimized features that have been deduced using advanced algorithms. The overall proposed model has been developed and tested using the ImageNet dataset. We use a fault-proof mechanism in the refined dataset by assessing the relevance of causal relationships through Granger Causality. The use of principal component analysis for the development of the features used for the training process of our model is what sets it apart. We have been able to achieve high accuracy in our model, along with a validation accuracy of 99.7% in our model.

DAB-UNET: Dual Attention Block UNET Segmentation for Diabetic Retinopathy Utilizing an Encoder-Decoder Residual

Ammar Al-Zubaidi — 2025-05-20

Fundus images play an essential role in ophthalmic diagnostics for the detection of many eye illnesses. The experiment begins with a thorough image preprocessing technique, which includes clipping the circular borders, scaling the image, enhancing the contrast, removing noise, and augmenting the data. The new combined block applies to extracting distinctive deep feature representations, which help to detect the first shape of the edges of each lesion. It is namely the Attention Block and the Conv-Deconv UNET model. Attention Block is subsequently implemented in order to augment the robustness and quality of feature depictions derived from a pair of DR images. The Dual Attention Block for the backbone, which is supplemented with hierarchical bottleneck attention, is what we propose here referred to as DAB-UNET. In order to emphasize retinal anomalies that are significant for fovea macula and DR semantic segmentation in the deteriorated retina, the network is made up of a unique bottleneck attention block. We trained Mask-RCNN model that comprises of a backbone for eliminate OD regions. Moreover, the proposed block combines self-attention with channel attention in order to highlight these abnormalities. Our results indicate that DAB-UNET is potentially very effective for identifying landmarks even when dealing with different types of retinal degenerative disorders.

Enhancing Pneumonia Classification Performance through CNN Architecture Optimization and Hyperparameter Tuning

Solikhun — 2025-06-25

In the era of health digitalization, early detection of pneumonia through medical image analysis is one of the main challenges in improving the quality of health services. This study aims to enhance pneumonia classification performance using Convolutional Neural Network (CNN) architecture optimization and careful hyperparameter tuning. Through the application of optimization techniques such as Random Search, Bayesian Optimization, and tuning key hyperparameters such as the number of convolution layers, kernel size, dropout rate, and learning rate, this research succeeded in identifying the optimal model configuration. The Proposed Method model shows the best overall performance based on research results involving three models: Proposed Method, VGG16, and ResNet50. With the highest F1 Score value of 0.8440, accuracy of 0.9000, and lowest loss of 0.0977, the Proposed Method achieved an optimal balance between recall and Precision. Although VGG16 has the highest recall, its low precision value shows a tendency to produce more false positives. In contrast, the Proposed Method, with the best Precision of 0.7600 and superior accuracy performance, makes it the most reliable model for classifying pneumonia in this study. Experimental results show a significant increase in classification accuracy compared with conventional approaches, thus supporting further implementation in clinical applications. This study also provides insight into the importance of a systematic approach in designing and optimizing CNN models for disease classification tasks, especially pneumonia.

Enhancing Image Recognition with Quaternion Neural Networks: A Novel Approach to Color Layer Integration

shatha A. baker — 2025-04-25

There are many uses for neural networks, and one of the more well-known uses is image recognition. Image recognition is one of the many applications for neural networks. Therefore, this study explores a new method for image recognition tasks—Quaternion Neural Networks (QNNs)—with the goal of expanding on conventional color layer separations. Color layers might potentially improve network performance by learning common parameters through input as linked values. Python implementation, quaternion network details, and Convolutional Neural Network (CNN) mathematics are included in the study. Experiments assess learning processes by taking into account the roles of color and structure as well as stability in the presence of noisy visuals. We establish a proof of concept for the effectiveness of quaternion networks that will open up new avenues for research and possible uses that could outperform or supplement traditional networks.

Categorisation of Vegetation Using Machine Learning and Remote Sensing Methods

Mansi Kambli — 2025-04-25

Agriculture is the primary source of income for the farmers and the growth of the crop has to be given the utmost attention and closely monitored for improved crop growth. The Precision agriculture is a farming management approach that enhances agricultural output sustainability by monitoring, measuring, and adapting to temporal and spatial variability by using multispectral satellite data. The farmers can map field data, organize and analyze it, and remotely monitor their crops by using satellite imagery data which leads to precision agriculture. The sugarcane is a cash crop and is used in this study as researchers are focusing more on success in sugarcane development. The detection for dense and sparse vegetation for the individual plots of the farmers helps to understand that the plot area is good in terms of soil, water for the sugarcane to grow .The sparse vegetation indicates area has slopes and water is not retained and creates problems in the growth of the sugarcane .The Remote sensing spectral bands are used and the vegetation indices like NDVI, EVI and RVI are used in sugarcane canopy. EVI works for dense canopy and RVI for sparse vegetation as shown by the research done in this paper. The tabular data along with tiff as well as jpg images both are used in the analysis. The machine learning further also helps to detect the sparse and dense vegetation and accuracy of all the classifiers is compared for the same.

The survey on different machine learning techniques applied to remotely sensed data of sugarcane crops is done in this research work. Also for imagery without cloud cover and with cloud cover, the Machine learning techniques are further explored and implemented for monitoring the crop for detecting dense and sparse vegetation using vegetation indices. The same plot of the farmer can be monitored each month to find the change detection and further the cause of sparse vegetation in particular plot can be diagnosed with the help of enhanced vegetation indices in future work .Normalized difference vegetation index (NDVI) is used in remote sensing to find healthy vegetation and it gets saturated at grand growth stages and so the novel method can be enhanced vegetation indices and ratio vegetation index which can be used to monitor at grand growth stages along with Machine learning models as shown in this research work.

Investigating Hybrid Quantum-Assisted Classicaland Deep Learning Model for MRI Brain TumorClassification

Anandhavalli Muniasamy — 2025-02-27

Brain tumors pose significant diagnostic and therapeutic challenges and are associated with high rates of illness and death. Magnetic Resonance Imaging (MRI) is now a commonly employed diagnostic technique for detecting brain tumors. However, accurately categorizing different types of tumors still poses a considerable difficulty. Deep learning techniques, specifically Convolutional Neural Networks (CNNs), have recently demonstrated encouraging outcomes in accurately categorizing brain cancers through the analysis of MRI data. Nevertheless, the effectiveness of Convolutional Neural Networks (CNNs) might be constrained by the magnitude and intricacy of the dataset. This study illustrates the application of hybrid quantum-classical convolutional neural network (HQC-CNN) and DenseNet121 model on the brain tumor classification. According to the experimental results, the models achieved the accuracy of 88% and 94% in classifying brain tumor images, respectively with HQC-CNN model and DenseNet121.

Detection of Corals, Seagrass, and Seaweeds using YOLOv9 Instance Segmentation with Image Augmentation

Elmo Ranolo — 2025-05-20

This study investigates the extent to which the YOLOv9e-instance segmentation model classifies and detects different types of marine objects, such as corals, marine life, seagrass, and seaweed. This study utilizes image augmentation techniques to improve the detection and classification of objects using yolov9. The study emphasizes the need to examine the distribution of classes within the dataset, as class imbalances can have a major impact on the model's performance. Throughout the training, the model showed a constant decrease in loss functions such as box loss, segmentation loss, and classification loss, demonstrating effective learning and generalization. The precision and recall metrics improved significantly, with a mean Average Precision (mAP) of 0.883 at an IoU threshold of 0.5, validating the model's high accuracy across classes. The F1-Confidence Curve study yielded an overall F1 score of 0.84 at a confidence threshold of 0.534, highlighting the model's robustness in achieving a balance between precision and recall. The results suggest that while the model excels in detecting corals, seagrass, and seaweed, it faces challenges in accurately identifying marine life, pointing to the need for additional refinement to address class imbalances.

Adapted Fast Gradient Projection Algorithm for Magnetic Resonance Image Denoising

Manar Al-Abaji — 2025-03-14

The critical objective of image denoising is to obtain a visually pleasing image that preserves the visually essential details from its noisy counterpart. Magnetic resonance (MR) images are obtained with degradations, and one frequent degradation is Rician noise, which occurs due to temperature fluctuations or technical errors. This degradation distorts the MR image details, leading to incorrect medical diagnoses and difficulties automating computerized tasks. Various existing denoised methods fail to attenuate the noise properly, leading to blurring or removing fine details from the processed images. Thus, an adapted fast gradient projection (AFGP) algorithm is proposed in this study for MR image denoising. The proposed algorithm can automatically compute the regularization parameter for each MR image via the local image information. Moreover, a details-emphasize phase is applied at each iteration to maintain the structure and delicate features. The performance of the proposed AFGP algorithm is assessed with a dataset of real noisy images, compared with various denoising algorithms, and the results are evaluated using three sophisticated accuracy methods in addition to runtime. Ultimately, the proposed approach yielded satisfactory outcomes, surpassing all comparable techniques with relatively fast runtimes.

MSP2P: Multi-Scale Point-based Approach for Optimal Crowd Localization Through Perspective Analysis

David Redo Nieto — 2025-02-27

Image-based individual localization in densely populated scenes presents practical advantages beyond mere head counting, as it enables a broader range of high-level tasks in crowd analysis. Crowd image data contain drastic changes in head sizes caused by the perspective effect. This specific challenge has not been addressed in the literature, as existing localization methods do not consider multi-scale features. To alleviate this issue, we propose a novel Multi- Scale Point-to-Point Network (MSP2P) in which a set of experts are in charge of predicting head locations at each view perspective level. However, the training procedure requires ground-truth scale information for precise one-to- one matching. For this reason, we develop a simple yet effective method that uses neighbor density information to estimate the scale associated with each head location. Extensive experiments demonstrate that our method outperforms most state-of-the-art methods on relevant counting benchmarks without compromising performance.

Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models

Bach Nguyen — 2025-05-19

Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: https://github.com/bachzz/UW-DiffPhys

Automatic Detection and Classification of Cerebral Microbleeds using 3D CNN

Muhammad Mohsin Khan — 2025-06-25

Cerebral microbleeds (CMBs) are referred to tiny foci of hemorrhage in brain parenchyma which are smaller than 5 (to 10) mm in size. The presence of CMBs is implicated in pathophysiology of cognitive impairment, dementia, radiation-induced vascular injury, traumatic brain injury, hypertensive microangiopathy, and aging. On brain MRI scans, CMBs appear as hypointense foci, most notable on T2*-weighted or susceptibility-weighted imaging (SWI). Detecting these tiny microbleeds with naked eye is a difficult and time-consuming task for radiologists. In this study we developed an algorithm for automatic detection of CMBs. We applied a two-step strategy: at first, we applied pre-processed 2D image dataset to You Only Look Once (YOLO) for detection of CMBs. Then, these detected CMBs locations are used to segment 3D patches from their original SWI volume in the datasets. Next, these patches are used as inputs for CNN. In the second step, we reduced the number of False Positives (FP) and improved our classification accuracy using 3D CNN. We used two datasets consisting of 979 patients: 879 of whom for training of models, and the remainder for independent validation. We were able to achieve an accuracy of 81 % and reduce the FP_avg to 0.16.

Aflutter Craft: Neural Art Transfer Platform

Rawad Abdulghafor — 2024-11-19

Image Style transfer is a neural network algorithm that copies the style of an existing image into another image while preserving the image’s content. There have been various approaches on style transfer in an effort to speed up the process or provide more appealing results, one of which is the usage of style attentional networks. Attention is an algorithm that scores different parts of an image based on their importance in the overall image, attention helps neural networks distinguish important parts of an image. We use attention to identify the parts of an image that represent image style to apply an overall style rather than a mask and to conserve parts of the content that are crucial to its identity (a visible object, a focused subject...etc). Aflutter Craft enhances an existing algorithm that uses attention for style transfer. Results show that our algorithm uniformly applies important parts of the style while simultaneously preserving the subject of the content image. Results from Aflutter Craft are chosen to be the most visually appealing according to 38.4% survey participants when compared to 4 other implementations. In addition, this paper introduces a cross platform application with a general application programming interface (API) capable of performing style transfer from anywhere.

Integrating Multi-Scale Feature Extraction into EfficientNet for Acute Lymphoblastic Leukemia Classification

Fallah Najjar — 2025-02-14

This paper proposes an integration of a multi-scale feature extraction technique into the EfficientNetV2B0 architecture for acute lymphoblastic leukemia (ALL) classification, which is one of the prevailing types of leukemias known to humans. Our proposed method, Multi-Scale Enhanced EfficientNet (MSEENet), further increased sensitivity and accuracy in dealing with more nuanced subtypes of ALL via a series of architectural innovations. The current study develops a general methodology consisting of dataset preparation, advanced preprocessing techniques, and a focused training regimen. Emphasis is laid on the ability of the model to extract features skillfully from the images of blood samples, done by combining the strengths of pre-trained EfficientNet with an Inception module to capture features at multiple scales that selectively boosts the saliency for sound classification. We evaluate the efficacy of the proposed MSEENet in detail through a detailed training and validation process, making use of a substantial dataset representing different appearances of ALL. The results portrayed a better performance of the model in almost all key metrics, namely: accuracy of 98.77%, precision of 98.99%, recall of 98.49%, Matthew's Correlation Coefficient of 98.34%, and an F1-score of 98.72%. These findings further underline how a possible MSEENet may be in ALL diagnoses and classifications, putting a practical, highly precise, reliable, and efficient instrument into health professionals' hands, hence also supporting developments of stratified or personalized cancer treatments.

Graph Edge Classification for Keypoint Grouping in Multi-Person Pose Estimation

Marwan Morsi — 2025-03-14

Human pose estimation (HPE) is an essential component of many computer vision (CV) tasks that involve humans, it is concerned with detecting people and pinpointing key anatomical body parts in a signal to obtain knowledge about the human subjects. 2D multi-person HPE is a sub-type used with 2D signals, such as RGB images, it is a precursor to many CV tasks such as action recognition, prediction, 3D HPE, etc. Bottom-up multi-person HPE is the more difficult approach, out of several methods used in 2D HPE, mainly due to joints-to-person assignment. We consider the problem of keypoint grouping in the case of bottom-up multi-person 2D HPE, where assembling joints into people instances is modeled as a graph edge classification problem. Recently several studies have used graph neural networks (GNNs) in an attempt to tackle the keypoint grouping problem, making the process learnable. However, some elements of the graph structure were overlooked. Hence, this paper aims to address the bottom-up grouping problem by introducing a novel supervised graph edge-classification model that incorporates edge features into a GNN encoder, is small and light-weight, and has relatively fewer assumptions about the input signal. Testing is done on the crowdPose-test dataset yielding an average precision (AP) score of 46.1%, which is comparable to other similar bottom-up frameworks.

Enhancing Monkeypox Diagnostics: Exploring the Potential of EfficientNet and Big Transfer

Sharia Arfin Tanim — 2024-07-25

The study presents BiT-EfficientNet, a novel hybrid model developed specifically for the precise classification of monkeypox lesions in skin images. By combining EfficientNet B6 and Big Transfer (BiT-M-R50x1), the model demonstrates exceptional performance in recognising patterns and managing visual features. BiT-EfficientNet demonstrates superior performance compared to existing models, achieving a precision of 98.25%, recall of 95.48%, F1 score of 96.84%, and accuracy of 96.86%. It is positioned as a strong contender through comparative analysis. A highly accurate model is achieved through careful parameter optimisation, resulting in a training accuracy of 99.14%. Assessing resilience through empirical means validates it. The purpose of this research is to investigate the utilisation of hybrid models in dermatological diagnostics and to demonstrate the potential of these models to advance medical picture classification capabilities. The findings have a substantial influence on improving diagnostic accuracy for illnesses such as monkeypox, which can lead to prompt interventions in healthcare provided by professional medical professionals.

A Review on Medical Image Applications Based on Deep Learning Techniques

Ali ABDULWAHHAB — 2024-07-05

The integration of deep learning in medical image analysis is a transformative leap in healthcare, impacting diagnosis and treatment significantly. This scholarly review explores deep learning's applications, revealing limitations in traditional methods while showcasing its potential. It delves into tasks like segmentation, classification, and enhancement, highlighting the pivotal roles of CNNs and GANs. Specific applications, like brain tumor segmentation and COVID-19 diagnosis, are deeply analyzed using datasets like NIH Clinical Center's Chest X-ray dataset and BraTS dataset, proving invaluable for model training. Emphasizing high-quality datasets, especially in chest X-rays and cancer imaging, the article underscores their relevance in diverse medical imaging applications. Additionally, it stresses the managerial implications in healthcare organizations, emphasizing data quality and collaborative partnerships between medical practitioners and data scientists. This review article illuminates deep learning's expansive potential in medical image analysis, a catalyst for advancing healthcare diagnostics and treatments.