Add multi-tenant DNS server for external and tenant use

This commit is contained in:
govardhan
2025-11-15 19:27:53 +05:30
parent f958776e59
commit 997ce28c60
2 changed files with 598 additions and 0 deletions

220
DNS-SERVER-GUIDE.md Normal file
View File

@ -0,0 +1,220 @@
# Multi-Tenant DNS Server Setup
## 🎯 Overview
This DNS server provides:
- **External DNS** for connectvm.cloud (public access)
- **Multi-tenant DNS** for dev, prod, and QA teams
- **High Availability** with 3 replicas
- **Metrics** for monitoring
## 🌐 DNS Zones Configured
### 1. Main Domain: `connectvm.cloud`
**Public DNS for main services**
Available records:
- `rancher.connectvm.cloud` → 160.30.114.10
- `paste.connectvm.cloud` → 160.30.114.10
- `fleet.connectvm.cloud` → 160.30.114.10
- `hello.connectvm.cloud` → 160.30.114.10
- `dns.connectvm.cloud` → 160.30.114.10
- `*.connectvm.cloud` → 160.30.114.10 (wildcard)
### 2. Dev Team: `dev.connectvm.cloud`
**Development team's DNS zone**
Available records:
- `app1.dev.connectvm.cloud`
- `app2.dev.connectvm.cloud`
- `api.dev.connectvm.cloud`
- `web.dev.connectvm.cloud`
- `dashboard.dev.connectvm.cloud`
- `jenkins.dev.connectvm.cloud`
- `gitlab.dev.connectvm.cloud`
- `db.dev.connectvm.cloud`
- `redis.dev.connectvm.cloud`
- `*.dev.connectvm.cloud` (wildcard)
### 3. Prod Team: `prod.connectvm.cloud`
**Production team's DNS zone**
Available records:
- `api.prod.connectvm.cloud`
- `web.prod.connectvm.cloud`
- `app.prod.connectvm.cloud`
- `admin.prod.connectvm.cloud`
- `portal.prod.connectvm.cloud`
- `db.prod.connectvm.cloud`
- `monitoring.prod.connectvm.cloud`
- `*.prod.connectvm.cloud` (wildcard)
### 4. QA Team: `qa.connectvm.cloud`
**QA/Testing team's DNS zone**
Available records:
- `test.qa.connectvm.cloud`
- `staging.qa.connectvm.cloud`
- `selenium.qa.connectvm.cloud`
- `automation.qa.connectvm.cloud`
- `reports.qa.connectvm.cloud`
- `*.qa.connectvm.cloud` (wildcard)
## 🔧 External Access
### DNS Server IPs:
- **Primary**: `160.30.114.10:30053` (UDP/TCP)
- **Internal**: `10.96.100.100:53` (ClusterIP)
### Configure Clients to Use DNS:
**Linux/Mac:**
```bash
# Add to /etc/resolv.conf
nameserver 160.30.114.10
# Or use specific port with dig:
dig @160.30.114.10 -p 30053 rancher.connectvm.cloud
```
**Windows:**
```powershell
# Set DNS server
netsh interface ip set dns "Ethernet" static 160.30.114.10
```
**Kubernetes Pods:**
```yaml
spec:
dnsPolicy: None
dnsConfig:
nameservers:
- 10.96.100.100
searches:
- connectvm.cloud
- dev.connectvm.cloud
```
## 📊 Monitoring
Access DNS metrics:
- **URL**: https://dns.connectvm.cloud/metrics
- **Prometheus endpoint**: Port 9153
Metrics include:
- Query count
- Query types
- Response codes
- Cache hit/miss rates
- Zone transfer stats
## 🔒 Security Features
- **Zone isolation**: Each tenant has separate DNS zone
- **No zone transfers**: Zones are read-only
- **Query logging**: All queries are logged
- **Rate limiting**: Built-in caching (300s TTL)
## 🛠️ Management
### Add New Record:
1. Edit `dns-server.yaml` ConfigMap
2. Add record to appropriate zone file
3. Increment Serial number
4. Git push → Fleet auto-deploys
**Example** - Add new dev app:
```
newapp IN A 160.30.114.10
```
### Add New Tenant Zone:
1. Create new zone file in ConfigMap
2. Add zone to Corefile
3. Git push
## 🧪 Testing
Test DNS resolution:
```bash
# Test main domain
dig @160.30.114.10 -p 30053 rancher.connectvm.cloud
# Test dev tenant
dig @160.30.114.10 -p 30053 app1.dev.connectvm.cloud
# Test prod tenant
dig @160.30.114.10 -p 30053 api.prod.connectvm.cloud
# Test QA tenant
dig @160.30.114.10 -p 30053 test.qa.connectvm.cloud
# Test wildcard
dig @160.30.114.10 -p 30053 anything.dev.connectvm.cloud
```
## 📱 Use Cases
### For Dev Team:
```bash
# Deploy app with custom DNS
kubectl create deployment myapp --image=nginx
kubectl expose deployment myapp --port=80
# Access via: myapp.dev.connectvm.cloud
```
### For Prod Team:
```bash
# Production API endpoint
api.prod.connectvm.cloud → Production API server
```
### For QA Team:
```bash
# Automated testing
selenium.qa.connectvm.cloud → Selenium Grid
automation.qa.connectvm.cloud → Test Runner
```
## 🚀 High Availability
- **3 replicas** across different nodes
- **Anti-affinity** rules for pod distribution
- **Auto-restart** on failure
- **Health checks** every 10 seconds
## 📝 DNS Server Info
- **Software**: CoreDNS 1.11.1
- **Protocol**: DNS (UDP/TCP port 53)
- **Upstream**: Google DNS (8.8.8.8, 8.8.4.4)
- **Cache TTL**: 300 seconds
- **Zone TTL**: 3600 seconds (1 hour)
## 🎯 Architecture
```
External Queries (port 30053)
NodePort Service
DNS Pods (3 replicas)
┌─────────────┬──────────────┬─────────────┐
│ Main Zone │ Dev Zone │ Prod Zone │
│ qa Zone │ K8s DNS │ Upstream │
└─────────────┴──────────────┴─────────────┘
```
## 🔄 Updates
All DNS updates happen via GitOps:
1. Edit zone files in Git
2. Push to Gitea
3. Fleet auto-deploys in ~15 seconds
4. DNS records updated automatically
No manual DNS server management needed!

378
dns-server.yaml Normal file
View File

@ -0,0 +1,378 @@
apiVersion: v1
kind: Namespace
metadata:
name: dns-server
---
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-dns-config
namespace: dns-server
data:
Corefile: |
# ================================================
# External DNS Server for connectvm.cloud
# Handles public DNS queries and tenant DNS
# ================================================
# Main domain - connectvm.cloud (Public)
connectvm.cloud:53 {
errors
log
file /etc/coredns/connectvm.cloud.db connectvm.cloud
prometheus :9153
}
# Tenant: dev-team (Development Team)
dev.connectvm.cloud:53 {
errors
log
file /etc/coredns/dev.connectvm.cloud.db dev.connectvm.cloud
prometheus :9153
}
# Tenant: prod-team (Production Team)
prod.connectvm.cloud:53 {
errors
log
file /etc/coredns/prod.connectvm.cloud.db prod.connectvm.cloud
prometheus :9153
}
# Tenant: qa-team (QA Team)
qa.connectvm.cloud:53 {
errors
log
file /etc/coredns/qa.connectvm.cloud.db qa.connectvm.cloud
prometheus :9153
}
# Internal Kubernetes DNS
cluster.local:53 {
errors
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
}
# Forward all other queries to upstream DNS
.:53 {
errors
log
forward . 8.8.8.8 8.8.4.4
cache 300
prometheus :9153
}
# Main domain zone file
connectvm.cloud.db: |
$ORIGIN connectvm.cloud.
$TTL 3600
@ IN SOA ns1.connectvm.cloud. admin.connectvm.cloud. (
2025111501 ; Serial
7200 ; Refresh
3600 ; Retry
1209600 ; Expire
3600 ) ; Negative Cache TTL
; Name Servers
@ IN NS ns1.connectvm.cloud.
@ IN NS ns2.connectvm.cloud.
ns1 IN A 160.30.114.10
ns2 IN A 160.30.114.10
; Main Services (Public)
@ IN A 160.30.114.10
www IN A 160.30.114.10
rancher IN A 160.30.114.10
paste IN A 160.30.114.10
fleet IN A 160.30.114.10
hello IN A 160.30.114.10
dns IN A 160.30.114.10
; Tenant Delegations
dev IN NS ns1.connectvm.cloud.
prod IN NS ns1.connectvm.cloud.
qa IN NS ns1.connectvm.cloud.
; Email
@ IN MX 10 mail.connectvm.cloud.
mail IN A 160.30.114.10
; Wildcard
* IN A 160.30.114.10
# Dev Team Tenant Zone
dev.connectvm.cloud.db: |
$ORIGIN dev.connectvm.cloud.
$TTL 3600
@ IN SOA ns1.connectvm.cloud. admin.dev.connectvm.cloud. (
2025111501 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
3600 ) ; Negative Cache TTL
; Name Server
@ IN NS ns1.connectvm.cloud.
; Dev Team Applications
@ IN A 160.30.114.10
app1 IN A 160.30.114.10
app2 IN A 160.30.114.10
api IN A 160.30.114.10
web IN A 160.30.114.10
dashboard IN A 160.30.114.10
jenkins IN A 160.30.114.10
gitlab IN A 160.30.114.10
; Development databases
db IN A 160.30.114.10
redis IN A 160.30.114.10
; Wildcard for dev team
* IN A 160.30.114.10
# Production Team Tenant Zone
prod.connectvm.cloud.db: |
$ORIGIN prod.connectvm.cloud.
$TTL 3600
@ IN SOA ns1.connectvm.cloud. admin.prod.connectvm.cloud. (
2025111501 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
3600 ) ; Negative Cache TTL
; Name Server
@ IN NS ns1.connectvm.cloud.
; Production Applications
@ IN A 160.30.114.10
api IN A 160.30.114.10
web IN A 160.30.114.10
app IN A 160.30.114.10
admin IN A 160.30.114.10
portal IN A 160.30.114.10
; Production infrastructure
db IN A 160.30.114.10
cache IN A 160.30.114.10
monitoring IN A 160.30.114.10
; Wildcard for prod team
* IN A 160.30.114.10
# QA Team Tenant Zone
qa.connectvm.cloud.db: |
$ORIGIN qa.connectvm.cloud.
$TTL 3600
@ IN SOA ns1.connectvm.cloud. admin.qa.connectvm.cloud. (
2025111501 ; Serial
3600 ; Refresh
1800 ; Retry
604800 ; Expire
3600 ) ; Negative Cache TTL
; Name Server
@ IN NS ns1.connectvm.cloud.
; QA/Testing Applications
@ IN A 160.30.114.10
test IN A 160.30.114.10
staging IN A 160.30.114.10
selenium IN A 160.30.114.10
automation IN A 160.30.114.10
reports IN A 160.30.114.10
; QA infrastructure
db IN A 160.30.114.10
; Wildcard for QA team
* IN A 160.30.114.10
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-server
namespace: dns-server
labels:
app: dns-server
spec:
replicas: 3
selector:
matchLabels:
app: dns-server
template:
metadata:
labels:
app: dns-server
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dns-server
topologyKey: kubernetes.io/hostname
containers:
- name: coredns
image: coredns/coredns:1.11.1
args: ["-conf", "/etc/coredns/Corefile"]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: config-volume
configMap:
name: custom-dns-config
items:
- key: Corefile
path: Corefile
- key: connectvm.cloud.db
path: connectvm.cloud.db
- key: dev.connectvm.cloud.db
path: dev.connectvm.cloud.db
- key: prod.connectvm.cloud.db
path: prod.connectvm.cloud.db
- key: qa.connectvm.cloud.db
path: qa.connectvm.cloud.db
---
# External DNS Service (NodePort for external access)
apiVersion: v1
kind: Service
metadata:
name: dns-external
namespace: dns-server
labels:
app: dns-server
annotations:
metallb.universe.tf/allow-shared-ip: dns
spec:
selector:
app: dns-server
type: NodePort
ports:
- port: 53
targetPort: 53
nodePort: 30053
protocol: UDP
name: dns-udp
- port: 53
targetPort: 53
nodePort: 30053
protocol: TCP
name: dns-tcp
---
# Internal DNS Service (ClusterIP for internal use)
apiVersion: v1
kind: Service
metadata:
name: dns-internal
namespace: dns-server
labels:
app: dns-server
spec:
selector:
app: dns-server
type: ClusterIP
clusterIP: 10.96.100.100
ports:
- port: 53
targetPort: 53
protocol: UDP
name: dns-udp
- port: 53
targetPort: 53
protocol: TCP
name: dns-tcp
---
# Metrics Service
apiVersion: v1
kind: Service
metadata:
name: dns-metrics
namespace: dns-server
labels:
app: dns-server
spec:
selector:
app: dns-server
type: ClusterIP
ports:
- port: 9153
targetPort: 9153
protocol: TCP
name: metrics
---
# Web UI/Metrics Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dns-metrics-ingress
namespace: dns-server
annotations:
cert-manager.io/cluster-issuer: "selfsigned-issuer"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- dns.connectvm.cloud
secretName: dns-metrics-tls
rules:
- host: dns.connectvm.cloud
http:
paths:
- path: /metrics
pathType: Prefix
backend:
service:
name: dns-metrics
port:
number: 9153