Note: this is a simplified and generalized description of a real system
Introduction
The Call Center Customer Care System has been developed by Andersen Consulting for a large US telecommunication company. The primary function of the system is to support interactions with the customers that request new services (ex: new phone lines), changes in the configuration of the existing services (ex: phone number changes, long-distance company changes, or relocation), or report problems.
The phone company has over 19Mil customers. Considering how often, on average, a customer changes his/her service configuration, the system has to support up to 400 company representatives simultaneously at near 7X24 availability level. However, these representatives are not the only means by which a customer can request a change or report problems. For example, there exists a phone service (Quick Service) by which customers can communicate with the system. This is discussed in more detail later.
呼叫中心客户服务系统是由 Andersen Consulting 为美国一家大型电信公司开发的。系统的主要功能是支持与请求新服务(例如:新电话线)、现有服务配置更改(例如:电话号码更改、长途公司更改或搬迁)的客户的交互,或报告问题。
这家电话公司拥有超过 1900 万客户。考虑到客户平均多久更改一次他/她的服务配置,系统必须同时支持多达 400 名公司代表,且可用性水平接近 7X24。但是,这些代表并不是客户可以请求更改或报告问题的唯一方式。例如,存在一种电话服务(Quick Service),客户可以通过它与系统进行通信。稍后将对此进行更详细的讨论。
System Interactions
The C4 (Call Center Customer Care System) system interacts with a number of other systems, in particular with:
- A network provisioning system that makes physical changes to the network configurations and supports network management
- A Billing system
- A host of corporate DBs, and
- A number of downstream systems.
A high level structure of the system interactions is show in Fig. 1. Collectively, this is a big thing and rather complicated.
C4(呼叫中心客户服务系统)系统与许多其他系统交互,特别是:
- 对网络配置进行物理更改并支持网络管理的网络供应系统
- 计费系统
- 大量公司数据库,以及
- 许多下游系统。
系统交互的高级结构如图 1 所示。总的来说,这是一件大事而且相当复杂。
Interactions with NOSS
NOSS (Network Operations Support System) is the network management and provisioning system. It is being developed concurrently by a large 3rd party hardware/software company specializing in communication networks. Its functionality contains:
Work force management - management of maintenance crews
Provisioning - putting physical network components in place such as connections from the curb to the house
Network creation - an peculiar name for maintaining information about new and existing physical network
Activation - automatic turning on/off of services
Network management that contains:
- status monitoring and measurements
- proactive maintenance
- diagnostics, and
- problem reporting, and
Field access - a subsystem providing up-to-the-minute information for field technicians.
The NOSS architecture is based on the OSI CMIS (Common Management Information Service). The interface between C4 and NOSS is a large set of messages in the named-tag/value format. Each message inter-change is technically a synchronous pair of messages composed of a request and a reply. At the application level there are two types of interactions: 1) a request followed by a response that contains the requested information and/or confirmation of NOSS action, and 2) an overall interaction consisting of two message pairs; a) initial request followed by a reply containing a request identifier, b) an unsolicited message from NOSS containing the previously issued request identifier along with related data followed by a C4 reply acknowledging receipt of message.
NOSS(网络运营支持系统)是网络管理和供应系统。它由专门从事通信网络的大型第三方硬件/软件公司同时开发。它的功能包括:
劳动力管理 - 维修人员管理
配置 - 将物理网络组件放置到位,例如从路边到房屋的连接
网络创建 - 用于维护有关新的和现有物理网络的信息的特殊名称
激活 - 自动开启/关闭服务
网络管理包含:
- 状态监控和测量
- 主动维护
- 诊断,和
- 问题报告,以及
现场访问——为现场技术人员提供最新信息的子系统。
NOSS 架构基于 OSI CMIS(通用管理信息服务)。C4 和 NOSS 之间的接口是一大组 named-tag/value 格式的消息。每个消息交换在技术上都是由请求和回复组成的同步消息对。在应用层,有两种类型的交互:1) 请求后跟包含请求信息和/或 NOSS 操作确认的响应,以及 2) 由两个消息对组成的整体交互; a) 初始请求后跟包含请求标识符的回复,b) 来自 NOSS 的未经请求的消息包含先前发布的请求标识符以及相关数据,后跟 C4 回复确认收到消息。
The basic interactions between C4 and NOSS are as follows:
- C4 sends a Service Request (service order messages) to NOSS to perform service reconfigurations
- C4 send queries to NOSS about existing network status or capabilities. This is usually done while an agent is talking to a customer. For example, a request may be sent to NOSS about availability of service at a particular geographic address/service location or possible service activation dates, and
- C4 may request from NOSS to lock resources such as phone numbers for a service request.
C4 与 NOSS 的基本交互如下:
- C4 向 NOSS 发送服务请求(服务订单消息)以执行服务重新配置
- C4 向 NOSS 发送有关现有网络状态或功能的查询。这通常是在座席与客户交谈时完成的。例如,可以向 NOSS 发送关于特定地理地址/服务位置的服务可用性或可能的服务激活日期的请求,以及
- C4 可能会请求 NOSS 锁定资源,例如服务请求的电话号码。
Interactions with Downstream Systems
Downstream Systems are systems such as: Long-distance Carrier Services, 911 Service, Voice Mail, Interfaces to Phone Directory, and Revenue Collections (credit scoring and checking).
The C4 interacts with these systems in two ways:
- It publishes requests via the Publish/Subscribe (P/S in Figure 1) component that the downstream systems should react to. For example, all new phone connections are published so the 911 service is connected to them in the required number of hours
- The downstream systems notify C4, via a direct asynchronous message, about significant business event that effect customer configuration. For example, the Long-distance Provider system may notify C4 that a client should be connected to a new long-distance company.
Note: one of the design challenges that we will discuss later was that direct requests from a customer may conflict with a similar request coming from an external system. For example, a customer may call to request a change of their long-distance carrier, say from Sprint to MCI, at the same time ATT sends notification that the same customer has now selected them as their long-distance carrier.
下游系统是这样的系统:长途运营商服务、911 服务、语音邮件、电话簿接口和收入征收(信用评分和检查)。
C4 以两种方式与这些系统交互:
- 它通过下游系统应响应的发布/订阅(图 1 中的 P/S)组件发布请求。例如,发布所有新电话连接,以便 911 服务在要求的小时数内连接到它们
- 下游系统通过直接异步消息通知 C4 影响客户配置的重大业务事件。例如,长途提供商系统可能会通知 C4 客户端应该连接到新的长途公司。
注意:我们稍后将讨论的设计挑战之一是来自客户的直接请求可能与来自外部系统的类似请求发生冲突。例如,客户可能会打电话请求更改他们的长途电话运营商,比如从 Sprint 到 MCI,同时 ATT 会发送通知,告知同一客户现在已选择他们作为他们的长途电话运营商。
Interactions with corporate DBs and Billing
Some of the major functions of the billing system are: (1) invoice calculation for local services, (2) invoice printing for both local and long-distance services, (3) revenue reporting, and (4) bill inquiry and adjustment.
C4 does not interact with the Billing System, but it works against the same set of DBs. More specifically, C4 interacts with the on-line part of the corporate DBs, while the Billing Systems work with the batch copy of the on-line DBs (this is shown in Figure 1). Monthly billing is done in a number of cycles. Each cycle processes billing information of a set of customers. All updates to the data of the current cycle are halted (applied only to the on-line data) until the end of the cycle. The updates (at the end of the cycle) are done in batch and are transparent C4.
All customer data are stored in over 100 tables. C4 does not use all of them. During any customer conversation, C4 obtains customer basic profile from about dozen tables. Other tables are read and/or updated on demand.
计费系统的一些主要功能是:(1)本地服务的发票计算,(2)本地和长途服务的发票打印,(3)收入报告,以及(4)账单查询和调整。
C4 不与计费系统交互,但它适用于同一组数据库。更具体地说,C4 与公司数据库的在线部分交互,而计费系统则与在线数据库的批量副本一起工作(如图 1 所示)。每月计费在多个周期内完成。每个周期处理一组客户的账单信息。对当前周期数据的所有更新都将停止(仅应用于在线数据)直到周期结束。更新(在周期结束时)是批量完成的,并且是透明的 C4。
所有客户数据都存储在 100 多个表中。C4 并不使用所有这些。在任何客户对话中,C4 从大约十几个表中获取客户基本资料。其他表按需读取和/或更新。
C4
Functional description
C4 is an OLTP system that handles an interesting type of transactions. A transaction is initiated by a customer call, or so called Business Event. There are three types of events:
- Service Negotiations
- Account Management, and
- Trouble Call Management.
Each event is then divided into Tasks which are in turn broken into Activities. Tasks are groups of related activities that the representative and the customer have to complete. For example, if the negotiation is about the customer moving from one address to another, there will be a set of activities related to terminating some of the existing services, a set of activities related to obtaining the new address, and a set of activities related to negotiating new services and activation dates.
C4 是一个 OLTP 系统,可以处理一种有趣的交易类型。交易由客户电话或所谓的业务事件发起。事件分为三种类型:
- 服务谈判
- 账户管理,以及
- 故障呼叫管理。
然后将每个事件划分为任务,任务又分解为活动。任务是代表和客户必须完成的一组相关活动。例如,如果谈判是关于客户从一个地址移动到另一个地址,那么将有一组与终止某些现有服务相关的活动,一组与获取新地址相关的活动,以及一组相关的活动 协商新服务和激活日期。
C4 must provide:
- Support for multiple concurrent tasks. For example, the customer should be able to negotiate a number of different services during the same call.
- Integrated support for completing activities (screen sequences, to-do list, context-sensitive data fields, etc.)
- Validation
- of availability of requested service
- completion of activities and tasks
- integrity of customer data, and
- integrity of the final requested configuration.
- Advice on available products and product ÒbundlesÓ
- Resolution of conflicting events, and
- Support for interrupted and long-lasting conversations.
C4 必须提供:
- 支持多个并发任务。例如,客户应该能够在同一通电话中协商多种不同的服务。
- 对完成活动的集成支持(屏幕序列、待办事项列表、上下文相关数据字段等)
- 验证
- 请求服务的可用性
- 完成活动和任务
- 客户数据的完整性,以及
- 最终请求配置的完整性。
- 关于可用产品和产品“捆绑”的建议
- 解决冲突事件,以及
- 支持中断和持久的对话。
The last two points are particularly interesting. Resolution of conflicting event comes from multiple socalled Authors of events. For example, while a wife is negotiating a new phone line, the husband is using the phone company kiosk at is bank to request an ISDN line that will have an extra two phone lines. In general, events can come from:
- Direct conversations with company representatives (the case described here)
- Automated call center (the phone menu-type system)
- Kiosks (future), and
- Direct customer connections such as Internet (future).
Before C4 sends a service request to NOSS, it has to make sure that all related events have been combined or conflicts resolved.
The other requirement comes from the fact that a conversation with a customer can be interrupted (for technical reasons, for example) or suspended by the customer or the representative. The first case is rather obvious. An example of the second case when a customer that says something like let me talk to my wife and I will call you back. In any case, C4 has to manage the context that persists and can be recalled.
最后两点特别有趣。冲突事件的解决来自多个所谓的事件作者。例如,当妻子正在洽谈一条新电话线时,丈夫正在使用银行的电话公司电话亭请求一条 ISDN 线路,该线路将有两条额外的电话线。一般来说,事件可以来自:
- 与公司代表的直接对话(此处描述的案例)
- 自动呼叫中心(电话菜单式系统)
- 售货亭(未来),以及
- 直接客户连接,例如互联网(未来)。
在 C4 向 NOSS 发送服务请求之前,它必须确保所有相关事件都已合并或冲突已解决。
另一个要求来自这样一个事实,即与客户的对话可能会被客户或代表中断(例如,出于技术原因)或暂停。第一种情况很明显。第二种情况的例子,当客户说让我和我妻子谈谈,我会给你回电话。在任何情况下,C4 都必须管理持续存在并可以被调用的上下文。
Key architectural challenges
Here is a partial list of architectural challenges for C4. The challenges are not completely independent, so there is some repetition in the list. A number of challenges are implied by the execution architecture selected for the system. More about the architecture in the following section.
Managing time and date effectively. Customer may want service changes in the future and this implies:
- some form of a tickler system
- ability to inform the customer about future changes of rates and/or services
Interfacing with multiple authors of business events. As explained above, the main source of business events is direct conversation of an agent with a customer. However, other sources such as downstream systems, kiosks, etc. have to be accommodated. Important points:
- business events from different authors may conflict with regards to the requested configuration at a service location
business events from different authors may be received and processed at the same point in time Ð business events from different authors become different service requests that must be cross validated to ensure a valid resulting configuration
Architect something that is economically viable with a small set of customers and yet can grow to a very large network (i.e. 15 million customers)
it should not require high initial equipment investment
it should allow for ÒleanerÓ growth (in respect to cost(capacity) function)
it should be able to grow at a rapid rate (for example, 1000+ new customers a day)
identification, monitoring, and elimination of processing bottle-necks
Validation of a requested service configuration should be done at near-real-time. This implies that:
C4 has to communicated with NOSS and other systems while guiding agents through tasks and activities, and
C4 may request to ÒlockÓ some of the network resources for a fixed amount of time (like a few phone numbers)
Support for long-lasting, interrupted sessions. This issues has been described above
Integrated (smart) performance support for company representatives
Support a large number (e.g. 400+) of service representatives concurrently
a minimum of 100 service representative to start
Near 7X24 application availability
以下是 C4 架构挑战的部分列表。挑战并非完全独立,因此列表中存在一些重复。为系统选择的执行架构暗示了许多挑战。更多关于下一节中的体系结构。
有效地管理时间和日期。客户将来可能希望更改服务,这意味着:
某种形式的记事系统
能够通知客户未来价格和/或服务的变化
与多个业务事件作者交互。如上所述,业务事件的主要来源是代理与客户的直接对话。但是,必须容纳下游系统、信息亭等其他来源。要点:
来自不同作者的业务事件可能会与服务位置请求的配置发生冲突
来自不同作者的业务事件可能在同一时间点被接收和处理 - 来自不同作者的业务事件成为不同的服务请求,必须交叉验证以确保有效的结果配置
构建一些在经济上对一小部分客户可行的东西,但可以发展到一个非常大的网络(即 1500 万客户)
初始设备投资不高
它应该允许更精简的增长(关于成本(容量)函数)
它应该能够快速增长(例如,每天有 1000 多个新客户)
识别、监控和消除加工瓶颈
应近乎实时地验证请求的服务配置。这意味着:
C4 必须与 NOSS 和其他系统通信,同时指导代理完成任务和活动,以及
C4 可能会请求“锁定”一些网络资源一段固定的时间(比如一些电话号码)
支持持久的、中断的会话。上面已经描述了这个问题
为公司代表提供综合(智能)绩效支持
同时支持大量(如 400+)服务代表
至少 100 名服务代表开始
近 7X24 应用程序可用性
Execution architecture
The system execution architecture is a standard three-tear C/S configuration as shown in Figure 2.
All agents workstations are PCs running the Windows10 connected to a LAN that is interconnect to a WAN. The middle layer is a cluster of HP9000 servers running UNIX and a TP Monitor that can load- balance between them. The TCP/IP communications protocol is used throughout the network. The back- end runs on a dedicated high throughput LAN. This LAN connects the Enterprise and Billing DBMSs to the servers as well as a TCP/IP connection to NOSS via a module that marshals massages between C4 and NOSS. C4 runs on the cluster of PC and HP9000.
Additional architectural requirements are as follows:
- No persistent data caching on the agent workstations to limit the implications of local failures
- No DBs at office locations
- no administrators at local offices
- no maintenance down-time, etc.
- The middle layer server cluster tuned for performance
- possibility to add serves to increase throughput
- The back end tuned for DB performance
- preferred place to do persistency
- Well engineered operations architecture
- High availability cannot be achieved by utilizing fault-tolerant hardware (this option is not economically viable)
所有代理工作站都是运行 Windows10 的 PC,连接到与 WAN 互连的 LAN。中间层是一组运行 UNIX 的 HP9000 服务器和一个可以在它们之间进行负载平衡的 TP 监视器。整个网络都使用 TCP/IP 通信协议。后端运行在专用的高吞吐量 LAN 上。该 LAN 将企业和计费 DBMS 连接到服务器,并通过一个在 C4 和 NOSS 之间编组消息的模块将 TCP/IP 连接到 NOSS。C4 运行在 PC 和 HP9000 的集群上。
其他架构要求如下:
- 代理工作站上没有持久数据缓存以限制本地故障的影响
- 办公地点没有数据库
- 当地办事处没有管理员
- 无维护停机时间等。
- 为性能调优的中间层服务器集群
- 添加服务以增加吞吐量的可能性
- 针对数据库性能调整的后端
- 做持久性的首选地方
- 精心设计的操作架构
- 无法通过使用容错硬件来实现高可用性(此选项在经济上不可行)