Decentralizing Image Processing Through the Integration of Federated Learning and Convolutional Neural Networks

Review article
DOI:
https://doi.org/10.60797/IRJ.2025.158.56
Issue: № 8 (158), 2025
Suggested:
09.06.2025
Accepted:
14.08.2025
Published:
18.08.2025
381
2
XML
PDF

Abstract

A revolution in federated image processing was precipitated by the convergence of Federated Learning (FL) and Convolutional Neural Networks (CNNs). The success of CNN has always been based on centralized data, which raised serious issues related to data ownership, security, and privacy. Federated learning appeared as the solution, addressing the issue of privacy without decreasing the model performance by allowing various clients to collaboratively train CNN models without sharing their local data. This study reflects the evolution of FL development using CNNs, from early federated CNN models to early real-world system implementations and post-additions with non-IID data distributions, communication inefficiencies, and system heterogeneity. The work covers important milestones, significant technology innovations, and real-world implementations in fields such as smart cities, health care, and environment monitoring. In addition to that, this study examines model efficiency, protecting privacy, and robustness issues in decentralized contexts and illustrates innovations that were developed to address them. This study gives a broad overview of how concepts of federated learning changed the image processing distribution model by considering constructing the FL-CNN system. It also offers basis for extension and application.

1. Introduction

Artificial intelligence (AI) technologies have become increasingly common in everyday life, especially in image processing. Medical diagnostics, traffic monitoring, and smartphone cameras are some of the AI systems that use images. All these systems depend on neural models and enormous datasets. The most popular architecture for these tasks is the Convolutional Neural Networks (CNNs), which excels at recognizing spatial patterns, textures, and structures in visual data. CNNs have attained record-breaking accuracy in applications like object detection, image classification, and semantic segmentation. Generally, CNNs are trained in centralized learning systems in which data is all gathered and processed within one central server. Although they are effective, the model has significant limitations: they make sensitive data vulnerable to breaches, require enormous bandwidth, and can result in regulatory non-compliance, especially under data protection legislation such as the GDPR or HIPAA.

Addressing these limitations, Google announced Federated Learning (FL), a system that enables machine learning (ML) models to be collaboratively trained on numerous devices without sending the data to a single server in 2016

. Each device has a local model, and it is trained on local data. Then these models share model updates, such as weights or gradients, with the server. It would improve privacy, reduce communication overhead, and ensure scalability. The combined application of CNNs and FL offers privacy-preserving, high-speed image processing on networks. Combining the two, however, has many technical, architectural, and ethical problems. The following paper is a full overview of FL with CNNs (FL-CNNs) background, key milestones, technical problems, real-world application, and future breakthroughs.

2. Background and motivation

2.1. Centralized CNN Overview

The development of CNNs has historical origin in the initial research on early neural networks, but it was only in 2012 that CNNs became the focal point in reshaping computer vision

. The 2012 revolution was introduced by Krizhevsky from their work “AlexNet” which convincingly beat all other models in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
,
. Because of the power of deep convolutional layers combined with GPU computations, AlexNet confirmed the dominance of deep learning for image classification and opened doors for faster development of CNN architecture.

Architecture of AlexNet

Figure 1 - Architecture of AlexNet

Figure 1 shows AlexNet's initial five convolutions and three fully connected layers architecture. The network also utilized ReLU activations, overlapping max-pooling, and dropout regularization, and it was one of the first applications of using parallel GPU processing. Depth of architecture and use of GPU revolutionized the process of large-scale image classification into a new paradigm of deep learning. A series of increasingly sophisticated CNN architectures were introduced after AlexNet. VGGNet in 2014 focused on depth with minimal architecture and small 3×3 convolution filters, achieving superb performance from design simplicity
. In the same year, GoogLeNet introduced the Inception module, an innovative paradigm that compromised depth for efficiency by reducing the parameter size while maintaining high accuracy
. Residual connections, being one of the key breakthroughs by ResNet in 2015, solved the issue of vanishing gradients and allowed training extreme deep networks. All of these advances collectively opened new horizons in computer vision performance
,
.

However, centralized CNNs perform amazingly well, they do possess certain serious drawbacks. Centralized CNNs are based on enormous quantities of labeled data and centralized computing capacity, like large GPUs or TPUs

. Centralized learning assumes that it can send all data and keep it in one central location, an argument that is questionable where there is sensitive data or decentralized sources. Sharing privacy-sensitive image data with a centralized server could cause privacy threats, especially in application domains such as surveillance or healthcare. Data transfer across nation-state or organizational boundaries may violate data protection laws and add legal complexities
. Centralized deployments also include a single point of failure, where compromising or disabling the central server makes the entire learning pipeline obsolete. Finally, transmitting enormous image data through networks causes bandwidth bottlenecks and latency, and the centralized CNNs thus do not function in real-time or in low-resource environments. All these limitations motivated researchers to study more decentralized solutions, which ultimately paved the way for FL.

2.2. Federated Learning as a Paradigm Shift

The fundamental idea behind FL was de-centralized training through allowing devices such as smartphones or sensors to run ML models locally on their own data. Rather than sending raw data to a single server, devices would share learned parameters or model updates. The server combines these updates, usually averaging, into a global model. The global model is sent back to the clients to be trained locally. This was quite different from the classic ML paradigms and allowed training on multiple edge devices without ever sending sensitive data off the user device. Figure 2 illustrates the FL where multiple edge devices (clients) perform local training of their respective local copies of a model on local data.

Basic Federated Learning Architecture

Figure 2 - Basic Federated Learning Architecture

FL had some major strengths. It has much more privacy since raw data never left the local device, eliminating the probability of data tampering or leakage. It also saved communication expenses in the sense the model parameters, of much smaller size compared to raw datasets, needed to be communicated. FL also possessed the strength of scalability, with the ability to train over millions of clients in parallel. These characteristics rendered FL extremely suitable for such mobile applications as keyboard prediction and face recognition, where specific data is heterogenous but exposed, and model accuracy improvement implies learning from this heterogenous, distributed data.

While useful, FL came with a new range of technical issues. One of the prominent issues among them was the existence of non-IID (non-Independent and Identically Distributed) data, meaning that the data distribution across different clients is significantly different

. This slows down model convergence and accuracy. Synchronization of model updates in real-time was another difficulty, especially where sparse or unreliable internet connectivity caused communication delays. Besides, heterogeneity among clients was also an issue since devices that are sources of FL tend to have different computing capability, battery, and bandwidths, which complicates the process of coordination and load balancing.

As researchers extended their work further in FL, eventually they combined it with CNNs too, especially for computer vision applications. CNNs, having achieved such success in vision analysis and recognition, had a lot to gain from the privacy-oriented, decentralized nature of FL. With the union of FL and CNNs, the promise became reality for safe, decentralized image processing systems to run on privacy-critical applications such as medicine, autonomous vehicles, and intelligent surveillance without compromising user data security.

To maximize the understanding of the revolution brought by FL, it is useful to contrast FL explicitly with traditional centralized learning paradigms. Table 1 displays key differences between main paradigms. The comparison illustrates how FL improves the limitations of centralized solutions by offering more privacy-ensuring, scalable, and fault-tolerant solutions particularly needed in real-world, decentralized environments.

Table 1 - Comparison Summary of Centralized learning and Federated learning

​Aspect

Centralized Learning​

​Federated Learning

Data Movement

Data is transferred to a master server

Data is kept on local devices

Privacy

High probability of data leakage

Improved data security and privacy

Communication Overhead

High, particularly for image data

less, only model parameters are shared

Scalability

Bandwidth and server capacity limited

Highly scalable to distributed clients

Fault Tolerance

Single point of failure

Higher, with distributed nodes

Deployment

Easier in controlled systems

Realistic to real-world, decentralized systems

3. CNN Integration in Federated Learning

3.1. Early Experiments

Ever since the discovery of FL in 2016, researchers were quickly adopting its intersection with deep learning models. There were attempts to merge CNN architecture with FL models in 2017 and 2018, which was a new frontier for privacy-preserving distributed image processing. CNNs are a natural choice for FL because they have already been widely utilized in image processing. As image data typically contains private information, especially in healthcare and smart surveillance applications, federated CNNs offer a means of having access to powerful visual recognition ability without compromising data privacy.

One of the oldest but strongest early implementations was McMahan's Federated Averaging (FedAvg) in 2017. FedAvg, as a model-agnostic approach, also demonstrated outstanding applicability to CNNs in supporting successful training on non-IID client-side distributed image data

. The experiments were conducted on data sets such as MNIST, CIFAR-10, and Fashion-MNIST
,
. Figure 3 shows the FedAvg algorithm workflow with an emphasis on the iterative process of local training, parameter aggregation, and global model update.

FedAvg algorithm

Figure 3 - FedAvg algorithm

Note: proposed by McMahan et al.

These initial experiments proved that CNNs could achieve competitive accuracy in the federated setting at lower convergence rate and greater communication expense compared to centralized approaches. However, the first application of CNNs to FL settings revealed a range of major issues. The data distribution as non-IID across clients was one main challenge. In federated environments, every client generally holds data that widely differs in quantity and type with other clients, and this resulted in a significant performance decrease when using CNNs. CNNs, which were generally robust with well-balanced and centralized data, could not work as accurately with such different data.

Communication bottlenecks were another crucial challenge. CNNs have an enormous number of parameters, and exchanging full model updates between the server and multiple clients was putting enormous pressure on the network. In addition, resources were scarce, which was also an obstacle. Most client devices involved in FL did not have adequate computational resources and memory space to locally train big CNNs. Those constraints require re-designing conventional training paradigms to fit edge environments more suitably.

Early studies addressed those challenges by adopting a wide array of optimization strategies. Model compression and quantization techniques were investigated for minimizing update sizes transmitted over and thus easing bandwidth burden. Partial model training, where some of the CNN layers are updated or trained while keeping others in a static state (also referred to as layer freezing), was proposed to minimize computation on clients

. Further, adaptive client selection protocols were used in preventing the use of slow or resource-constrained clients.

Federated CNNs had already been demonstrated to execute benchmark data with strong performance by December 2018 and showed promise for privacy-preserving distributed image recognition applications. From 2018 to 2020, Federated Learning using CNNs moved from proof-of-concept experiments to real-world applications, particularly in distributed image processing

. Need for the secure processing of visual data particularly in healthcare, smart cities, and mobile computing encouraged the building of applied research upon FL-CNN models.

3.2. Key Domains for Federated CNNs

- Healthcare and Medical Imaging

Among the strongest application domains was medical image processing, where data privacy is most important. Research centers and hospitals began using FL-CNN architectures to simultaneously learn the models for tumor detection, organ segmentation, and disease classification without the disclosure of patient data. Federated approaches with CNNs in CT and MRI scans, for example, enabled institutes to develop precise models for brain tumor detection on geographically dispersed datasets

- Mobile Applications

For mobile scenarios, researchers used federated CNNs in in-device tasks such as face detection, scene parsing, and photo classification. The tasks incorporated user-provided data captured by users' mobile cameras, which are their own personal data and not publicly disclosed. Federated frameworks enabled models to be trained in a joint manner on various devices without infringing on users' privacy

.

- Smart Cities and Surveillance 

Federated CNNs were also applied in smart city solutions to use cases like traffic monitoring, crowd counting, and anomaly detection. Since surveillance images are individual, FL offered a solution for the conventional centralized processing that respects privacy. Experiments demonstrated how local devices, such as edge nodes or cameras, could enable CNN learning without uploading raw video frames

,
.

4. Technical Challenges and Solutions

One of the most significant FL challenges is heterogeneity in data and systems. The clients’ data is heterogeneous, often non-IID in size and distribution. Devices in FL are extremely different in terms of processing capacity, battery duration, and network connectivity. In this scenario, several techniques have been proposed to deal with these challenges. Federated models allow users to maintain their own on-premises setup to use their own individual data. Techniques of domain adaptation bring data representations closer to being more consistent on various devices. Clustering is another method through which similar distribution data is grouped together to ensure efficient and accurate training

,
.

Communication efficiency is also another issue, particularly because CNNs are extremely parameterized. Sending full model updates via slow or unreliable networks is usually not feasible in real-world FL deployments. Gradient quantization and sparsification are some methods that are utilized to compress the payload in order to cut down on communication expense. Even update compression can be done using pruning or low-rank approximations. Optimization of models must also overcome the constrained resources in the edge devices, usually with minimal computational resources to perform deep CNN training. To overcome this constraint, less computationally heavy and memory-heavy CNN models like MobileNet, EfficientNet, and SqueezeNet are employed requiring less computation and memory

. Split learning is also a powerful method, where shallow model layers are locally trained on the client devices and deeper layers are controlled by a central server. Partial updates are also utilized, through which only partial layers of the model are updated in each round of training, in order to decrease the computational load on the client devices
,
,
.

Lastly, privacy is still a paramount issue for FL. Although FL enhances privacy via data localization, it does not totally eradicate the risk. Model inversion attacks that aim to derive private information from model updates and poisoning attacks that seek to poison the model are still possible. To counter these threats, differential privacy (DP) mechanisms introduce controlled noise to model updates, safeguarding individual client data. Also, strong aggregation techniques such as Krum and Trimmed Mean are used to identify anomalous or attack updates.

5. Future Directions

As advanced federated learning models appear in the future, research must develop more adaptive aggregation algorithms that can cope with the heterogeneity of client devices such as different computing power, connectivity, and data quality. Moreover, building incentive mechanisms will be crucial to encourage participation from different clients.

Decentralized federated learning is also one of the significant future trends. Compared to the typical FL systems based on depending on a central server for aggregating model updates, decentralized methods like blockchain-based or peer-to-peer federated learning may eliminate the single point of failure. There are some advantages in decentralization, such as greater transparency and audibility of the aggregation process, which can help improve participants’ trust. Also, decentralized systems are known to facilitate increased fault tolerance in the sense that learning continues unimpeded under failure or attack on an individual node.

Furthermore, the other promising area is to combine federated learning with self-supervised and semi-supervised learning approaches. Since FL models, and especially CNNs, will require a significant amount of labeled data — something which can be costly or difficult to get, especially in domains such as medicine or scientific imaging — opening up self-supervised channels like SimCLR or BYOL can significantly minimize the dependency on human annotation

. These approaches enable models to learn expressive representations from unlabeled data, which can be further fine-tuned from very small sets of labeled samples.

Finally, as FL-CNNs are becoming popular and are widely used in high-impact use-cases, ethical issues arise. Bias and fairness issues are most seriously addressed when data is not equally distributed across clients, which could result in models that are great for one group of use-cases but exclude others. Model decision transparency is also integral to user trust-building and accountability purposes. Apart from this, clear mechanisms of gaining users’ consent and having users control how their data will be utilized must be incorporated into federated systems

,
. Above all, the ethical standards cannot be an afterthought or add-on, but must be embedded into the design and development phase so that AI could be deployed responsibly and ethically.

6. Conclusion

The combination of FL and CNNs is a milestone step toward developing intelligent models that are not only incredibly capable but also intensely privacy-concern. By enabling collaborative learning on massive, heterogeneous, and geographically distributed datasets, FL-CNNs address one of the most critical dilemmas challenging AI today: how to maximize the potential of data without compromising data security or users’ trust. This equilibrium lies at the core of the current AI issues, and the emergence of FL-CNNs offers a promising way out.

Over the last decade, something that initially began as hypothetical experimentation has evolved into a suite of powerful, useful tools with broad applications across many domains. In the field of medicine, for instance, federated CNNs enabled hospitals to collectively train their models for diagnosis without ever exchanging confidential patient information. FL-CNNs advance smart city infrastructure by consolidating data from scattered sensors to maximize city management operations such as traffic lights and energy supply without violating citizens' privacy.

As a result, despite having brought it closer with these developments, the path ahead is not easy. Heterogeneity in data where client devices generate data that differs in distribution, quality, and size is an issue in achieving strong model convergence and consistent performance. Bandwidth and connectivity limitations, especially in low-bandwidth or broken connectivity conditions, add complication into delivering effective and timely model updates. Furthermore, since FL-CNNs bring together mission-critical applications impacting numerous lives, ethical considerations take overriding significance. Algorithmic fairness issues, model transparency during decision-making, informed consent, and equity in access are issues that need to be repeatedly addressed to make federated AI solutions accessible to all segments equally and fairly.

In the future, long-term innovation and growth in application of FL-CNNs will be essential to building an inclusive, scalable, and trusted AI ecosystem. By bridging privacy-protective control with learning resilience, these technologies establish a new benchmark for how AI can co-exist with human rights and values. As FL-CNNs develops, they hold the potential to open new frontiers in use-cases from personalized healthcare, intelligent cities and financial services and beyond, all with the assumption that safeguarding individuals' data and dignity is not secondary, but a priority to design.

Article metrics

Views:381
Downloads:2
Views
Total:
Views:381