anan@think:~/works/openshift-versions/works$ cat install-config.yaml.bkup
additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
hyperthreading: Disabled
name: worker
platform: {}
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Disabled
name: master
platform: {}
replicas: 3
metadata:
name: weli-test
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
platform:
aws:
region: us-east-1
vpc: {}
publish: External-
-
Save liweinan/28927a870099494b6e23fc8aaf58c3c3 to your computer and use it in GitHub Desktop.
OCP-22168
The command to extract INFRA_ID:
anan@think:~/works/openshift-versions/works$ INFRA_ID_PREFIX="${INFRA_ID_PREFIX:-${CLUSTER_NAME:-}}"
if [[ -n "${INFRA_ID_PREFIX}" ]]; then
INFRA_ID=$(aws --region "${AWS_REGION}" ec2 describe-vpcs 2>/dev/null | \
jq -r --arg prefix "${INFRA_ID_PREFIX}" \
'.Vpcs[]? |
select(.Tags != null and (.Tags | type) == "array" and (.Tags | length) > 0) |
select(.Tags[]? | select(.Key == "Name" and (.Value | startswith($prefix)))) |
.Tags[]? |
select(.Key != null and (.Key | startswith("kubernetes.io/cluster/"))) |
.Key |
sub("^kubernetes.io/cluster/"; "")' | \
head -n 1)
fi
anan@think:~/works/openshift-versions/works$ echo $INFRA_ID
weli-test-569wj
anan@think:~/works/openshift-versions/works$ echo $INFRA_ID_PREFIX
weli-test
anan@think:~/works/openshift-versions/works$ anan@think:~/works/openshift-versions/works$ INFRA_ID_PREFIX="${INFRA_ID_PREFIX:-${CLUSTER_NAME:-}}"
if [[ -n "${INFRA_ID_PREFIX}" ]]; then
INFRA_ID=$(aws --region "${AWS_REGION}" ec2 describe-vpcs 2>/dev/null | \
jq -r --arg prefix "${INFRA_ID_PREFIX}" \
'.Vpcs[]? |
select(.Tags != null and (.Tags | type) == "array" and (.Tags | length) > 0) |
select(.Tags[]? | select(.Key == "Name" and (.Value | startswith($prefix)))) |
.Tags[]? |
select(.Key != null and (.Key | startswith("kubernetes.io/cluster/"))) |
.Key |
sub("^kubernetes.io/cluster/"; "")' | \
head -n 1)
fi
anan@think:~/works/openshift-versions/works$ echo $INFRA_ID
weli-test-569wj
anan@think:~/works/openshift-versions/works$ echo $INFRA_ID_PREFIX
weli-test
anan@think:~/works/openshift-versions/works$ echo "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/cluster/${INFRA_ID}\":\"owned\"}]}}"
{"aws":{"region":"us-east-1","identifier":[{"kubernetes.io/cluster/weli-test-569wj":"owned"}]}}
anan@think:~/works/openshift-versions/works$ echo "{\"aws\":{\"region\":\"${AWS_REGION}\",\"identifier\":[{\"kubernetes.io/cluster/${INFRA_ID}\":\"owned\"}]}}" | jq
{
"aws": {
"region": "us-east-1",
"identifier": [
{
"kubernetes.io/cluster/weli-test-569wj": "owned"
}
]
}
}
anan@think:~/works/openshift-versions/works$ Original metadata.json:
anan@think:~/works/openshift-versions/works$ cat metadata.json | jq
{
"clusterName": "weli-test",
"clusterID": "6f84551a-5936-42dc-95f3-a04952f958d2",
"infraID": "weli-test-569wj",
"aws": {
"region": "us-east-1",
"identifier": [
{
"kubernetes.io/cluster/weli-test-569wj": "owned"
},
{
"openshiftClusterID": "6f84551a-5936-42dc-95f3-a04952f958d2"
},
{
"sigs.k8s.io/cluster-api-provider-aws/cluster/weli-test-569wj": "owned"
}
],
"clusterDomain": "weli-test.qe.devcluster.openshift.com"
},
"featureSet": "",
"customFeatureSet": null
}# OCP-22663 - [ipi-on-aws] Pick instance types for machines per region basis
## Test Case Overview
This test case validates that the OpenShift installer correctly selects instance types for AWS machines based on regional availability. The installer uses a priority-based fallback mechanism to select the best available instance type for each region.
## Current Implementation Behavior
The installer uses the following instance type priority list for AMD64 architecture:
1. m6i.xlarge (primary preference)
2. m5.xlarge (fallback)
3. r5.xlarge (fallback)
4. c5.2xlarge (fallback)
5. m5.2xlarge (fallback)
6. c5d.2xlarge (fallback)
7. r5.2xlarge (fallback)
The installer automatically checks instance type availability in the selected region and availability zones, selecting the first available type from the priority list.
## Test Steps
### Test Case 1: Standard Region with m6i Available
Objective: Verify that the installer selects m6i.xlarge when it's available in the region.
Prerequisites:
- AWS credentials configured
- Access to a standard AWS region (e.g.,
us-east-1,us-west-2,ap-northeast-1,eu-west-1)
Steps:
1. Create the Install Config asset:
openshift-install create install-config --dir instance_types12. Modify the region field in install-config.yaml:
platform:
aws:
region: us-east-1 # or another region where m6i is available3. Generate the Kubernetes manifests:
openshift-install create manifests --dir instance_types1Expected Result:
- The installer should select
m6i.xlargeas the instance type - Verify the instance type in the generated manifests:
grep -r instanceType: instance_types1/
- Expected output should show:
openshift/99_openshift-cluster-api_master-machines-0.yaml: instanceType: m6i.xlarge
### Test Case 2: Region Where m6i is Not Available
Objective: Verify that the installer falls back to m5.xlarge when m6i is not available in the region.
Prerequisites:
- AWS credentials configured
- Access to a region where
m6iinstance types are not available (e.g.,eu-north-1,eu-west-3,us-gov-east-1)
Steps:
1. Create the Install Config asset:
openshift-install create install-config --dir instance_types22. Modify the region field in install-config.yaml:
platform:
aws:
region: eu-west-3 # Region where m6i may not be available3. Generate the Kubernetes manifests:
openshift-install create manifests --dir instance_types2Expected Result:
- The installer should detect that
m6i.xlargeis not available and fall back tom5.xlarge - Verify the instance type in the generated manifests:
grep -r instanceType: instance_types2/
- Expected output should show:
openshift/99_openshift-cluster-api_master-machines-0.yaml: instanceType: m5.xlarge
### Test Case 3: Full Cluster Installation Verification
Objective: Verify that the selected instance type works correctly during actual cluster installation.
Prerequisites:
- AWS credentials configured with sufficient permissions
- Valid base domain and pull secret
Steps:
1. Use the install config from Test Case 1 or Test Case 2
2. Launch the cluster:
openshift-install create cluster --dir instance_types2Expected Result:
- Installation completes successfully
- Master nodes are created with the expected instance type
- Verify instance types of running instances:
# After cluster installation, verify via AWS CLI or console aws ec2 describe-instances --filters "Name=tag:Name,Values=*master*" --query 'Reservations[*].Instances[*].[InstanceType,Tags[?Key==`Name`].Value|[0]]' --output table
- Create a new project and deploy a test application to verify cluster functionality:
oc new-project test-instance-types oc new-app --image=nginx --name=test-app oc get pods -w
## Additional Verification
### Verify Instance Type Selection Logic
To understand why a specific instance type was selected, check the installer logs:
# Enable debug logging
export OPENSHIFT_INSTALL_LOG_LEVEL=debug
openshift-install create manifests --dir instance_types1Look for log messages related to instance type selection and availability checks.
### Manual Instance Type Availability Check
You can manually verify instance type availability in a region using AWS CLI:
# Check if m6i.xlarge is available in a specific region
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters "Name=instance-type,Values=m6i.xlarge" \
--region us-east-1 \
--query 'InstanceTypeOfferings[*].Location' \
--output table
# Check if m5.xlarge is available
aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters "Name=instance-type,Values=m5.xlarge" \
--region eu-west-3 \
--query 'InstanceTypeOfferings[*].Location' \
--output table## Notes
1. Instance Type Availability: Instance type availability can vary by region and availability zone. The installer automatically handles this by checking availability and selecting the best option.
2. Regional Overrides: If specific regions require different instance type priorities, they can be configured in pkg/types/aws/defaults/platform.go using the defaultMachineTypes map.
3. Architecture Support: This test case focuses on AMD64 architecture. ARM64 architecture uses different instance types (e.g., m6g.xlarge).
4. Version Compatibility:
- For OpenShift 4.10 and later: Default instance type is
m6i.xlarge, with fallback tom5.xlargeifm6iis not available - For OpenShift 4.6 to 4.9: Default instance type was
m5.xlarge - For OpenShift 4.5 and earlier: Default instance type was
m4.xlarge
## Implementation Details
This section explains how the instance type selection logic works in the codebase, including the key components and their interactions.
### 1. Instance Type Defaults Definition
Location: pkg/types/aws/defaults/platform.go
The InstanceTypes() function defines the default priority list of instance types based on architecture and topology:
// InstanceTypes returns a list of instance types, in decreasing priority order
func InstanceTypes(region string, arch types.Architecture, topology configv1.TopologyMode) []string {
// Check for region-specific overrides first
if classesForArch, ok := defaultMachineTypes[arch]; ok {
if classes, ok := classesForArch[region]; ok {
return classes
}
}
instanceSize := defaultInstanceSizeHighAvailabilityTopology // "xlarge"
// Single node topology requires larger instance (2xlarge) for 8 cores
if topology == configv1.SingleReplicaTopologyMode {
instanceSize = defaultInstanceSizeSingleReplicaTopology // "2xlarge"
}
switch arch {
case types.ArchitectureARM64:
return []string{
fmt.Sprintf("m6g.%s", instanceSize),
}
default: // AMD64
return []string{
fmt.Sprintf("m6i.%s", instanceSize), // Primary: m6i.xlarge
fmt.Sprintf("m5.%s", instanceSize), // Fallback 1: m5.xlarge
fmt.Sprintf("r5.%s", instanceSize), // Fallback 2: r5.xlarge
"c5.2xlarge", // Fallback 3
"m5.2xlarge", // Fallback 4
"c5d.2xlarge", // Fallback 5 (Local Zone compatible)
"r5.2xlarge", // Fallback 6
}
}
}Key Points:
- Returns instance types in priority order (highest to lowest)
- Supports region-specific overrides via
defaultMachineTypesmap - Adjusts instance size based on topology (HA vs single-node)
- Different instance types for ARM64 vs AMD64 architectures
### 2. Instance Type Selection Logic
Location: pkg/asset/machines/aws/instance_types.go
The PreferredInstanceType() function selects the best available instance type by checking availability in the specified zones:
// PreferredInstanceType returns a preferred instance type from the list of
// instance types provided in descending order of preference
func PreferredInstanceType(ctx context.Context, meta *awsconfig.Metadata,
types []string, zones []string) (string, error) {
if len(types) == 0 {
return "", errors.New("at least one instance type required")
}
// Create EC2 client to query instance type availability
client, err := awsconfig.NewEC2Client(ctx, awsconfig.EndpointOptions{
Region: meta.Region,
Endpoints: meta.Services,
})
if err != nil {
return "", fmt.Errorf("failed to create EC2 client: %w", err)
}
// Query AWS to get instance type availability per zone
found, err := getInstanceTypeZoneInfo(ctx, client, types, zones)
if err != nil {
// If query fails, return first type as fallback
return types[0], err
}
// Iterate through types in priority order
for _, t := range types {
// Check if this instance type is available in ALL required zones
if found[t].HasAll(zones...) {
return t, nil
}
}
// If no type available in all zones, return first type with error
return types[0], errors.New("no instance type found for the zone constraint")
}The getInstanceTypeZoneInfo() function queries AWS EC2 API to check instance type availability:
func getInstanceTypeZoneInfo(ctx context.Context, client *ec2.Client,
types []string, zones []string) (map[string]sets.Set[string], error) {
found := map[string]sets.Set[string]{}
// Query AWS EC2 DescribeInstanceTypeOfferings API
resp, err := client.DescribeInstanceTypeOfferings(ctx, &ec2.DescribeInstanceTypeOfferingsInput{
Filters: []ec2types.Filter{
{
Name: aws.String("location"),
Values: zones, // Filter by availability zones
},
{
Name: aws.String("instance-type"),
Values: types, // Filter by instance types
},
},
LocationType: ec2types.LocationTypeAvailabilityZone,
})
if err != nil {
return found, err
}
// Build a map: instance type -> set of available zones
for _, offering := range resp.InstanceTypeOfferings {
f, ok := found[string(offering.InstanceType)]
if !ok {
f = sets.New[string]()
found[string(offering.InstanceType)] = f
}
f.Insert(aws.ToString(offering.Location))
}
return found, nil
}Key Points:
- Queries AWS EC2 API to check real-time instance type availability
- Requires instance type to be available in ALL specified availability zones
- Returns first available type from priority list
- Falls back to first type if API query fails
### 3. Master Machine Configuration
Location: pkg/asset/machines/master.go
The master machine configuration integrates the instance type selection logic:
// When instance type is not specified by user
if mpool.InstanceType == "" {
// Determine topology mode
topology := configv1.HighlyAvailableTopologyMode
if pool.Replicas != nil && *pool.Replicas == 1 {
topology = configv1.SingleReplicaTopologyMode
}
// Get priority list of instance types
instanceTypes := awsdefaults.InstanceTypes(
installConfig.Config.Platform.AWS.Region,
installConfig.Config.ControlPlane.Architecture,
topology,
)
// Select best available instance type
mpool.InstanceType, err = aws.PreferredInstanceType(
ctx,
installConfig.AWS,
instanceTypes,
mpool.Zones,
)
if err != nil {
// If selection fails, use first type from list as fallback
logrus.Warn(errors.Wrap(err, "failed to find default instance type"))
mpool.InstanceType = instanceTypes[0]
}
}
// Filter zones if instance type is not available in all default zones
if zoneDefaults {
mpool.Zones, err = aws.FilterZonesBasedOnInstanceType(
ctx,
installConfig.AWS,
mpool.InstanceType,
mpool.Zones,
)
if err != nil {
logrus.Warn(errors.Wrap(err, "failed to filter zone list"))
}
}Key Points:
- Only runs when user hasn't specified an instance type
- Determines topology (HA vs single-node) based on replica count
- Calls
InstanceTypes()to get priority list - Calls
PreferredInstanceType()to select best available type - Filters zones if selected instance type isn't available in all zones
### 4. Machine Manifest Generation
Location: pkg/asset/machines/aws/machines.go
The Machines() function generates Kubernetes Machine manifests with the selected instance type:
// Machines returns a list of machines for a machinepool
func Machines(clusterID string, region string, subnets aws.SubnetsByZone,
pool *types.MachinePool, role, userDataSecret string,
userTags map[string]string, publicSubnet bool) ([]machineapi.Machine,
*machinev1.ControlPlaneMachineSet, error) {
mpool := pool.Platform.AWS
// Create machines for each replica
for idx := int64(0); idx < total; idx++ {
zone := mpool.Zones[int(idx)%len(mpool.Zones)]
subnet, ok := subnets[zone]
// Create provider config with selected instance type
provider, err := provider(&machineProviderInput{
clusterID: clusterID,
region: region,
subnet: subnet.ID,
instanceType: mpool.InstanceType, // Uses selected instance type
osImage: mpool.AMIID,
zone: zone,
role: role,
// ... other fields
})
// Create Machine object
machine := machineapi.Machine{
Spec: machineapi.MachineSpec{
ProviderSpec: machineapi.ProviderSpec{
Value: &runtime.RawExtension{Object: provider},
},
},
}
machines = append(machines, machine)
}
return machines, controlPlaneMachineSet, nil
}The provider() function creates the AWS machine provider configuration:
func provider(in *machineProviderInput) (*machineapi.AWSMachineProviderConfig, error) {
config := &machineapi.AWSMachineProviderConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: "machine.openshift.io/v1beta1",
Kind: "AWSMachineProviderConfig",
},
InstanceType: in.instanceType, // Set from selected instance type
// ... other configuration fields
}
return config, nil
}Key Points:
- Generates Machine manifests for each replica
- Uses the instance type selected by
PreferredInstanceType() - Creates AWSMachineProviderConfig with the instance type
- Distributes machines across availability zones
### Execution Flow Summary
1. User creates install-config → Specifies region (and optionally instance type)
2. Master machine configuration (master.go):
- If instance type not specified, calls
InstanceTypes()to get priority list - Calls
PreferredInstanceType()to select best available type
3. Instance type selection (instance_types.go): - Queries AWS EC2 API to check availability
- Returns first type available in all zones
4. Machine manifest generation (machines.go): - Creates Machine objects with selected instance type
- Writes manifests to disk
## Related Code References
- Instance type defaults:
pkg/types/aws/defaults/platform.go - Instance type selection logic:
pkg/asset/machines/aws/instance_types.go - Machine manifest generation:
pkg/asset/machines/aws/machines.go - Master machine configuration:
pkg/asset/machines/master.go
OCP-29648
anan@think:~/works/openshift-versions/works$ cat install-config.yaml.bkup
additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform:
aws:
amiID: ami-01095d1967818437c
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform:
aws:
amiID: ami-0c1a8e216e46bb60c
replicas: 3
metadata:
name: weli-test
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
platform:
aws:
region: us-east-1
vpc: {}
publish: External# 查看 master 节点的 AMI(应该看到 ami-0c1a8e216e46bb60c)
echo "Master 节点 AMI:"
aws ec2 describe-instances \
--region "${REGION}" \
--filters "Name=tag:kubernetes.io/cluster/${INFRA_ID},Values=owned" \
"Name=tag:Name,Values=*master*" \
"Name=instance-state-name,Values=running" \
--output json | jq -r '.Reservations[].Instances[].ImageId' | sort | uniq
# 查看 worker 节点的 AMI(应该看到 ami-01095d1967818437c)
echo "Worker 节点 AMI:"
aws ec2 describe-instances \
--region "${REGION}" \
--filters "Name=tag:kubernetes.io/cluster/${INFRA_ID},Values=owned" \
"Name=tag:Name,Values=*worker*" \
"Name=instance-state-name,Values=running" \
--output json | jq -r '.Reservations[].Instances[].ImageId' | sort | uniq
Master 节点 AMI:
ami-0c1a8e216e46bb60c
Worker 节点 AMI:
ami-01095d1967818437cOCP-21531
Verify the Pull Secret:
anan@think:~/works/openshift-versions/421nightly$ vi ../auth.json
anan@think:~/works/openshift-versions/421nightly$ oc adm release extract --command openshift-install --from=registry.ci.openshift.org/ocp/release:4.21.0-0.nightly-2025-12-22-170804 -a ../auth.json
anan@think:~/works/openshift-versions/421nightly$ du -h openshift-install
654M openshift-installExport variables:
anan@think:~/works/openshift-versions/work3$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.ci.openshift.org/ocp/release:4.21.0-0.nightly-2025-12-22-170804
anan@think:~/works/openshift-versions/work3$ export OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=ami-01095d1967818437cUsing different version installer to install the cluster:
anan@think:~/works/openshift-versions/work3$ ../421rc0/openshift-install version
../421rc0/openshift-install 4.21.0-rc.0
built from commit 8f88b34924c2267a2aa446dcdc6ccdd5260f9c45
release image quay.io/openshift-release-dev/ocp-release@sha256:ecde621d6f74aa1af4cd351f8b571ca2a61bbc32826e49cdf1b7fbff07f04ede
WARNING Found override for release image (registry.ci.openshift.org/ocp/release:4.21.0-0.nightly-2025-12-22-170804). Release Image Architecture is unknown
release architecture unknown
default architecture amd64anan@think:~/works/openshift-versions/work3$ ../421rc0/openshift-install create cluster
WARNING Found override for release image (registry.ci.openshift.org/ocp/release:4.21.0-0.nightly-2025-12-22-170804). Release Image Architecture is unknown
INFO Credentials loaded from the "default" profile in file "/home/anan/.aws/credentials"
WARNING Found override for OS Image. Please be warned, this is not advised
INFO Successfully populated MCS CA cert information: root-ca 2035-12-23T03:35:54Z 2025-12-25T03:35:54Z
INFO Successfully populated MCS TLS cert information: root-ca 2035-12-23T03:35:54Z 2025-12-25T03:35:54Z
INFO Credentials loaded from the AWS config using "SharedConfigCredentials: /home/anan/.aws/credentials" provider
WARNING Found override for release image (registry.ci.openshift.org/ocp/release:4.21.0-0.nightly-2025-12-22-170804). Please be warned, this is not advised Check the installed cluster version and the used amiID:
anan@think:~/works/openshift-versions/work3$ export KUBECONFIG=/home/anan/works/openshift-versions/work3/auth/kubeconfig
anan@think:~/works/openshift-versions/work3$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.21.0-0.nightly-2025-12-22-170804 True False 71m Cluster version is 4.21.0-0.nightly-2025-12-22-170804$ oc get machineset.machine.openshift.io -n openshift-machine-api -o json | \
jq -r '.items[] | .spec.template.spec.providerSpec.value.ami.id'
ami-01095d1967818437c
ami-01095d1967818437c
ami-01095d1967818437c
ami-01095d1967818437c
ami-01095d1967818437cOCP-22425
OCP-22425
Cluster A:
anan@think:~/works/openshift-versions/work3$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-106-174.ec2.internal Ready control-plane,master 8h v1.34.2
ip-10-0-157-14.ec2.internal Ready control-plane,master 8h v1.34.2
ip-10-0-30-65.ec2.internal Ready worker 8h v1.34.2
ip-10-0-54-54.ec2.internal Ready worker 8h v1.34.2
ip-10-0-74-122.ec2.internal Ready worker 8h v1.34.2
ip-10-0-76-206.ec2.internal Ready control-plane,master 8h v1.34.2anan@think:~/works/openshift-versions/work3$ oc get route -n openshift-authentication
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
oauth-openshift oauth-openshift.apps.weli-test.qe.devcluster.openshift.com oauth-openshift 6443 passthrough/Redirect None
anan@think:~/works/openshift-versions/work3$ oc get po -n openshift-apiserver
NAME READY STATUS RESTARTS AGE
apiserver-6b767844c6-2jztv 2/2 Running 0 8h
apiserver-6b767844c6-g4rck 2/2 Running 0 8h
apiserver-6b767844c6-jzv4z 2/2 Running 0 8hanan@think:~/works/openshift-versions/work3$ oc rsh -n openshift-apiserver apiserver-6b767844c6-2jztv
Defaulted container "openshift-apiserver" out of: openshift-apiserver, openshift-apiserver-check-endpoints, fix-audit-permissions (init)
sh-5.1# Cluster B:
anan@think:~/works/openshift-versions/works2$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-122-6.ec2.internal Ready control-plane,master 27m v1.34.2
ip-10-0-134-89.ec2.internal Ready control-plane,master 27m v1.34.2
ip-10-0-141-244.ec2.internal Ready worker 13m v1.34.2
ip-10-0-31-52.ec2.internal Ready worker 21m v1.34.2
ip-10-0-67-21.ec2.internal Ready control-plane,master 27m v1.34.2
ip-10-0-96-196.ec2.internal Ready worker 21m v1.34.2anan@think:~/works/openshift-versions/works2$ oc get po -n openshift-apiserver
NAME READY STATUS RESTARTS AGE
apiserver-574bdcd758-j85sh 2/2 Running 0 10m
apiserver-574bdcd758-l98ph 2/2 Running 0 10m
apiserver-574bdcd758-p922j 2/2 Running 0 8m8s
anan@think:~/works/openshift-versions/works2$ oc rsh -n openshift-apiserver apiserver-574bdcd758-j85sh
Defaulted container "openshift-apiserver" out of: openshift-apiserver, openshift-apiserver-check-endpoints, fix-audit-permissions (init)sh-5.1# curl -k https://oauth-openshift.apps.weli-test.qe.devcluster.openshift.com/healthz
oksh-5.1#
anan@think:~/works/openshift-versions/works$ head -n 20 install-config.yaml additionalTrustBundlePolicy: Proxyonly apiVersion: v1 baseDomain: qe.devcluster.openshift.com compute: - architecture: amd64 hyperthreading: Disabled name: worker platform: {} replicas: 3 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3OCP-23544 日志分析报告(第二次测试)
测试场景
OCP-23544: [ipi-on-aws] [Hyperthreading] Create cluster with hyperthreading disabled with default instance size.
预期结果: 集群创建失败
测试时间: 2025-12-22 22:49:44 (UTC+8)
配置验证
✅ Install Config 配置正确
从
rendered-assets/openshift/manifests/cluster-config.yaml中可以看到:关键改进:
hyperthreading: Disabled(符合测试要求)hyperthreading: Enabled(符合测试要求)✅ 实例类型配置
从多个配置文件中确认:
m6i.xlarge✅(默认实例大小)m6i.xlarge✅(默认实例大小)m6i.xlarge✅所有节点都使用了默认实例大小
m6i.xlarge,符合测试要求。✅ MachineConfig 配置
从
99-worker-disable-hyperthreading.yaml可以看到:关键发现:
集群状态分析
❌ Worker 节点创建状态
从
clusterapi/Cluster-openshift-cluster-api-guests-weli-test-87ndj.yaml中可以看到:关键发现:
NoWorkers,NoReplicas)MachineSet 状态
从
99_openshift-cluster-api_worker-machineset-0.yaml可以看到:所有 5 个 worker MachineSet(us-east-1a 到 us-east-1d, us-east-1f)的
replicas都是 0。AWSMachine 对象
在
clusterapi/目录中,只找到了以下 AWSMachine 对象:AWSMachine-openshift-cluster-api-guests-weli-test-87ndj-bootstrap.yamlAWSMachine-openshift-cluster-api-guests-weli-test-87ndj-master-0.yamlAWSMachine-openshift-cluster-api-guests-weli-test-87ndj-master-1.yamlAWSMachine-openshift-cluster-api-guests-weli-test-87ndj-master-2.yaml从 Cluster 状态可以看到:
关键发现:
Master 节点实例状态
从
AWSMachine-openshift-cluster-api-guests-weli-test-87ndj-master-0.yaml可以看到:关键发现:
ready: true从 serial 日志中可以看到:
关键发现:
CPU 选项分析
从
AWSMachine-openshift-cluster-api-guests-weli-test-87ndj-master-0.yaml中可以看到:关键发现:
cpuoptions.ThreadsPerCore未设置m6i.xlarge实例类型,如果要在 AWS 层面禁用超线程,需要设置ThreadsPerCore: 1失败原因分析
可能的原因
Worker 节点未创建的原因:
Control Plane 初始化失败:
超线程配置问题:
ThreadsPerCore: 1)cpuoptions为空验证结论
✅ 完全验证了的问题
配置正确性:
hyperthreading: Disabledhyperthreading: Enabledm6i.xlarge集群创建失败:
Worker 节点创建失败:
AWS 实例层面的超线程配置:
cpuoptions.ThreadsPerCore未设置失败的根本原因:
与第一次测试的对比
关键改进:
建议
检查 AWS 实例配置:
cpuoptions.ThreadsPerCore: 1查看更详细的日志:
验证资源计算:
m6i.xlarge禁用超线程后:4 vCPU → 2 物理核心网络问题排查:
总结
✅ 完全验证了 OCP-23544 的配置要求:
hyperthreading: Disabled✅hyperthreading: Enabled✅m6i.xlarge✅建议:
cpuoptions.ThreadsPerCore: 1来在 AWS 层面禁用超线程