Differences and Relationships Between Java Distributed and Cluster

1. Let's talk about the differences:

A sentence: Distributed systems work in parallel, while clusters work in series.

1. Distributed refers to distributing different businesses to different places. A cluster refers to concentrating several servers together to achieve the same business.

Every node in the distributed system can be a cluster. However, a cluster is not necessarily distributed.

Example: For example, Sina.com can set up a cluster when there are many visitors. A response server is placed in front, and several servers complete the same business. When there is a business access, the response server looks at which server's load is not heavy and assigns the task to that server.

Distributed systems, in a narrow sense, are similar to clusters, but their organization is more loose. Unlike clusters, which have an organization, other servers can take over if one server fails.

In a distributed system, each node completes different business tasks. If one node fails, the business becomes inaccessible.

2. Simply put, distributed computing improves efficiency by shortening the execution time of individual tasks, while clustering improves efficiency by increasing the number of tasks executed per unit of time.

Let's illustrate this with a simple example:

if a task consists of10subtasks, each subtask requires1hours, then executing the task on a single server would take10hours.

using a distributed solution, providing10servers, each server is responsible for handling a subtask, without considering the dependency between subtasks, and the task can be completed in just one hour. (A typical representative of this working mode is Hadoop's Map/(Reduce distributed computing model)

and using a cluster solution, the same is provided10servers, each of which can independently handle this task. Suppose there are10tasks arrive simultaneously,10servers will work simultaneously,1hours later,10tasks completed simultaneously, so, on the whole,1complete a task within an hour!

Second. Cluster Concept

1. Two Key Features

A cluster is a set of collaborating service entities used to provide a service platform with greater scalability and availability than a single service entity. From the client's perspective, a cluster appears to be a single service entity, but in fact, it is composed of a set of service entities. Compared to a single service entity, a cluster provides the following two key features:

a. Scalability:The performance of the cluster is not limited to a single service entity; new service entities can be dynamically added to the cluster, thereby enhancing the performance of the cluster.

b. High Availability:The cluster makes the client immune to easily encountering out-of-service warnings by using service entity redundancy. In the cluster, the same service can be provided by multiple service entities. If one service entity fails, another service entity takes over the failed service entity. The functionality provided by the cluster to recover from one failed service entity to another enhances the availability of the application.

2. Two Major Capabilities

To have the characteristics of scalability and high availability, a cluster must have the following two major capabilities:

a. Load Balancing:Load balancing can distribute tasks more evenly across the computing and network resources in the cluster environment.

b. Error Recovery:Due to some reason, a resource executing a task fails, and another resource executing the same task in another service entity completes the task next. This process of error recovery, where one entity's resource fails and another entity's resource continues to complete the task transparently, is called error recovery.

Load balancing and error recovery require that there be resources executing the same task within each service entity, and for each resource executing the same task, the information view (information context) required to perform the task must be the same.

3. Two Major Technologies

Two major technologies must be implemented to achieve clustering:

a. Cluster Address:A cluster is composed of multiple service entities, and the cluster client accesses the cluster's cluster address to obtain the functions of the various service entities within the cluster. Having a single cluster address (also known as a single image) is a basic feature of a cluster. The maintenance of the cluster address setting is called a load balancer. The load balancer is responsible for managing the addition and removal of various service entities internally, and for translating the cluster address to the internal address of the service entities externally. Some load balancers implement true load balancing algorithms, while others only support task switching. Load balancers that only implement task switching are suitable for supporting ACTIVE-In a STANDBY cluster environment, where only one service entity is working in the cluster, and when the working service entity fails, the load balancer redirects subsequent tasks to another service entity.

b. Internal Communication:In order to work together and achieve load balancing and error recovery, the various entities within the cluster must communicate frequently, such as load balancer heartbeat test information for service entities, and communication of task execution context information between service entities.

Having the same cluster address allows clients to access the computing services provided by the cluster. A single cluster address hides the internal addresses of various service entities, allowing the required computing services to be distributed among the various service entities. Internal communication is the foundation for the normal operation of the cluster, enabling it to have load balancing and error recovery capabilities.

III. Cluster Classification

Linux clusters are mainly divided into three categories (High Availability Cluster, Load Balancing Cluster, Scientific Computing Cluster)

1. High Availability Cluster (High Availability Cluster)

The most common one is2There are many通俗 but unscientific names for a HA cluster made up of individual nodes, such as "dual-machine hot backup", "dual-machine mutual backup", and "dual-machine".

High availability clusters solve the problem of ensuring that user applications can continuously provide services to external users. (Please note that high availability clusters are not used to protect business data, but to ensure that user business applications provide uninterrupted services to external users, preventing interruptions caused by software/Hardware/Human-caused failures are minimized to the greatest extent possible to affect business operations).

2. Load Balance Cluster (Load Balance Cluster)

Load balancing system: All nodes in the cluster are active, and they share the system's workload. General types of clusters such as web server clusters, database clusters, and application server clusters belong to this category.

Load balancing clusters are generally used for web servers, database servers, and other network request-responsive servers. This type of cluster can check for servers that are less busy and less loaded when receiving requests, and redirect the requests to these servers. From the perspective of checking the status of other servers, load balancing and fault-tolerant clusters are very similar, the main difference being the number of servers.

3. Scientific Computing Cluster (High Performance Computing Cluster)

High Performance Computing (HPC) cluster, abbreviated as HPC cluster. This type of cluster is dedicated to providing powerful computing capabilities that a single computer cannot provide.

4. The Relationship and Difference Between Distributed (Cluster) and Cluster

Distributed refers to distributing different businesses to different places.

A cluster refers to the concentration of several servers to achieve the same business.

Each node in a distributed system can be a cluster.

Clusters are not necessarily distributed.

Distributed systems, in a narrow sense, are similar to clusters, but their organization is more loose. Unlike clusters, which have an organization, other servers can take over if one server fails.

In a distributed system, each node completes different business tasks. If one node fails, the business becomes inaccessible.

That's all for the content of this article. I hope the content of this article can bring a certain amount of help to everyone's learning or work, and I also hope to support the呐喊 tutorial more!

Declaration: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to notice#w3Please send an email to codebox.com (replace # with @ when sending email) to report any violations, and provide relevant evidence. Once verified, this website will immediately delete the suspected infringing content.

Basic Tutorial