Active Files A Mechanism for Integrating Legacy Applications into Distributed Systems £

更新时间:2023-05-09 03:21:02 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Active Files:A Mechanism for Integrating Legacy Applications into Distributed Systems

Partha Dasgupta Department of Computer Science Arizona State University

partha@08f99541336c1eb91a375d88 Ayal Itzkovitz and Vijay Karamcheti Department of Computer Science

New York University

ayali,vijayk@08f99541336c1eb91a375d88

Abstract

Despite increasingly distributed internet information sources with diverse storage formats and access-control constraints,most of the end applications(e.g.,?lters and media players)that view and manipulate data from these sources operate against a traditional?le-based interface. These legacy applications need to be rewritten to access re-mote sources,or need to rely upon ad hoc intermediary ap-plications that aggregate the data into a passive?le before executing the legacy application.

This paper presents a simple,elegant,programmable method for allowing natural integration of legacy applica-tions into distributed system infrastructures.The approach called active?les,enables multiple information sources to be encapsulated as a local?le that serves as their logical proxyw.This local?le is accessed though a sentinel pro-cess,which automatically starts when the?le is opened,ag-gregates data from multiple sources,and?lters all access to and from the?le.More importantly,the integration of active ?les into client applications is transparent:an active?le is virtually indistinguishable from a regular?le.Active?les ?nd a variety of applications in both distributed and non-distributed systems.We discuss active?les,their semantics, their usage and their implementations in Windows NT. 1.Introduction

The growth of the internet has been accompanied by an explosion in information sources.These sources are dis-tributed,store data in different formats,are dynamically changing,and may be subject to a variety of constraints in-cluding consistency,privacy,and access-control.However, most of the end applications that view and manipulate data This research was sponsored by DARPA/AFRL-Rome agreements F30602-96-1-0320and F30602-99-1-0517,by NSF award CCR-9876128, and Microsoft.from these sources(e.g.,browser helper applications such as?lters and media players)assume a traditional?le-based interface,treating?les simply as a passive,persistent,un-interpreted sequence of bytes.Consequently,the integra-tion of such legacy applications into distributed systems has required either signi?cant code modi?cation to use a new distributed system-aware API or relies on the ad hoc use of intermediary applications that isolate the end application from the data sources.These intermediaries perform neces-sary operations such as access control,?ltering,and format conversion before aggregating the data into a passive?le that can be handed down to legacy applications.

However,there are signi?cant shortcomings to both these approaches.The?rst,supported by constructs such as dynamic HTML and cgi-bin scripts to couple activity with network-accessed?les,and component frameworks such as DCOM,and CORBA to support object-speci?c access interfaces,has seen restricted use except in a few scenar-ios.Reasons include the vastly different semantics provided by these constructs,their relatively complicated APIs,and their heavyweight implementations.The second approach although more popular,has the disadvantage that the data collected by the intermediary is completely decoupled from both the original sources of the information and the end ap-plication.Consequently,it is unable to track changes in the original sources or be controlled by the end application.For example,an end application that searches through a col-lection of distributed databases cannot see changes in these databases,nor in?uence the progress of the search when an intermediary?rst aggregates data from these databases and presents it to the search application as a?le.

In this paper,we present an elegant,easy to use,and programmable concept,that allows natural integration of legacy applications into distributed system infrastructures. This association is done by a construct we call active?les, which enables distributed sources of information to be en-capsulated in the form of a local?le that serves as their logical proxy.An active?le has the look and feel of a reg-ular?le,but is associated with a sentinel(process)that can

To appear in the Proceedings of International Conference on Distributed Computing Systems, April 2000

act on the streams of data that enter the?le on a write or exit the?le on a read.More importantly,from the perspec-tive of the end-application,active?les are indistinguishable from non-active?les.There is no reprogramming,or re-compilation necessary for using active?les.The sentinel can perform a variety of tasks,including aggregating data from distributed sources,?ltering data entering/exiting the ?le,and performing actions that have external side-effects.

Active?les provide a convenient abstraction for alleviat-ing several shortcomings of intermediary approaches.The sentinel process can control?ow of data between distributed sources and the end application,enforcing consistency,pri-vacy,and access-control constraints required by the former while simultaneously yielding control to the end applica-tion.For example,an active?le can provide the illusion of accessing a single?le even though the?le data is physically located on multiple remote sites with varied authentication and access-control policies.In addition,it can monitor how the application uses this?le,caching only the most fre-quently accessed contents for performance.Moreover,the cache can be kept consistent with any updates performed to its contents at any of the remote sources.Note that all of these behaviors can be expressed without modifying either the end application or the original sources of data.

Active?les can also enhance regular?le functionality in non-distributed systems.Traditionally,once a regular?le is made accessible to a process,no control can be exercised on when or how the process uses the?le.In general,the owner/creator of a?le may wish to control and log its ac-cesses,?lter the data supplied or stored,or may just want some side effect(such as noti?cation)to be triggered as a result of the access.For instance,a log?le that accepts log entries from many processes may want to enforce some form of locking.A?le containing sensitive data would like to log every access from users,even if these users are trusted users.Active?les provide an elegant mechanism for ex-pressing many such diverse applications.

Active?les differ signi?cantly both from approaches such as Ufo[1]and Prospero[13]that overload/extend the ?le system interface to provide seamless access to remote ?les,and approaches such as Watchdogs[3]that rely on kernel support for noti?cation about?le access.Unlike the hard-coded functionality of the former,active?les are com-pletely programmable,enabling the expression of general per-?le behaviors in a simple,uniform,and conceptually elegant fashion.Moreover,even though an access noti?-cation mechanism is suf?cient to implement locking,?lter-ing,and other features,the heavyweight nature of kernel in-volvement restricts its applicability.In contrast,active?les can be implemented ef?ciently at the user level,and conse-quently can be used for a much larger set of applications.

We describe an implementation of active?les in Win-dows NT.Our implementation relies on the binary inter-ception of Win32?le API calls[2,11].The intercepted calls enable the sentinel process to both attend to applica-tion demands and constraints of the distributed information sources,without requiring the active participation of either. We show that several implementation approaches are possi-ble that trade off cost for programming convenience.

The remainder of the paper is as follows.Sections2and 3discuss semantics and uses of active?les,with their im-plementation and programming described in Sections4and Section5.The performance overheads of active?les are presented in Section6.Section7discusses related work. Appendix A describes the Windows NT implementations.

2.Active Files

An active?le is a regular?le that is associated with an ex-ecutable program.When an active?le is opened,the asso-ciated executable is run as a sentinel process.The sentinel connects with the user process using pipes and can directly access both the remote information source(s)and the local ?08f99541336c1eb91a375d88er process writes are sent to the sentinel along the write pipe,and user process reads extract data out of the read pipe.Logically,the sentinel contains two threads that handle the?ow in both directions between the user process, the remote sources,and the local data?le(see Figure1). 2.1.File system interface

An active?le is represented in the?le system by two passive ?les:a data?le,and an executable.Directory operations such as creating,copying,and deleting result in correspond-ing operations on the passive components.For instance,a copy operation produces a second active?le with the same data and executable components as the?rst one.

User processes interact with a sentinel process using standard?le API calls such as CreateFile,OpenFile, ReadFile,WriteFile,and CloseHandle.Other API calls such as GetFileSize are passed on to the sen-tinel for handling as appropriate.Consequently,from the user process’perspective,interactions with active?les are indistinguishable from interactions with ordinary(passive)?les.Associating the?le handle used by the user process to the two pipes is the responsibility of the implementation.

2.2.Semantics of the sentinel process

The sentinel process is started and terminated when a user process opens and closes the active?le.If multiple user pro-cesses open the same active?le,multiple sentinels are cre-ated,which synchronize amongst themselves in a program-dependent fashion using semaphores,shared memory or other forms of interprocess communication(IPC).

Figure1.The logical view of an active?le and a user process.

The sentinel process is best viewed as an entity that ag-gregates information from and distributes information to re-mote sources,serving as a two-way?lter between the user application and the information sources.The data?le asso-ciated with an active?le acts as a local cache.The active?le processes data sent to it by the user process,writing it to the data part and optionally also sending it to the remote loca-tion.It also reads the data part of the active?le,processing it before making it available to the user process.The sen-tinel can be a null?lter,in which case the active?le has the semantics of a passive?le.The sentinel can also be prac-tically any program;the system puts no restrictions of its capabilities.Note that an active?le can have an empty data part.In this case,the sentinel either directly interacts with the user process by producing/consuming required data,or acts as a conduit between the application and the network.

Writing the sentinel process is straightforward,although the speci?cs depend on how the abstraction is implemented. Figure2shows the code for a null?lter in the simplest im-plementation strategy,which directly mimics the logical ab-straction.The sentinel process consists of two threads.The ?rst reads data in from the network,writes it to the data?le (the cache)and then sends the data to a read pipe that trans-ports it to the application.The second thread reads off of the write pipe and writes the received data to the cache as well as forwards it on to the original source.Section4reports on other implementation strategies and associated sentinel pro-gramming.These strategies support handshaking between the user and sentinel processes,and represent more aggres-sive implementations where some functionality is migrated between the sentinel and user processes.

2.3.Security implications

Although this paper does not focus on the security aspects of active?les,a few points need discussion.Opening an active?le is predicated upon access to the passive?le com-ponents,and launches a program under the user-id of the application that opened the?le.This program can,of course have any side effect,including malicious ones,such as de-stroying data and activating viruses.However,these effects are no different from those initiated by any other executable started under the same user-id.Note that the operating sys-tem already places restrictions on how the latter can ac-cess the user machine.In applications with additional se-curity requirements,orthogonal techniques such as certi?-cates,code signing,and sandboxing[9]can be used.

08f99541336c1eb91a375d88es of Active Files

The active?le is a general-purpose construct that has a large number of potential applications in both sequential and dis-tributed systems,limited only by what can be expressed in the sentinel process.Similar to concepts such as?les,pipes, and scripts,the active?le can be used for many scenarios, when combined with other system programs.

In general,the sentinel process can encapsulate four fun-damental actions(see Figure3shows these actions):(1) data generation,(2)input and output?ltering,(3)aggrega-tion,and(4)distribution.The?rst two primarily interact with the local data?le,while the other two involve remote information 08f99541336c1eb91a375d88rger applications are constructed by composing these actions in different ways.Note that the data?le need not actually exist:the sentinel process just creates the illusion of its existence for client applications. Data generation The sentinel process can completely ob-viate the existence of a physical(passive)?le that stores the data associated with the active?le.An example of such use is when the sentinel process just contains a random num-ber generator.In this case,the corresponding active?le appears to client programs as a data?le that contains an in?nite stream of random numbers.

Input and output?ltering The sentinel can introduce ac-tions on either all or a subset of the read and write accesses to the active?le.This admits a range of uses,from keep-

HANDLE hin,hout,hcache,hpipe;

DWORD RWThrd(DWORD dir)

char buf[1024];

DWORD rbytes,wbytes;

while(...)

if(dir==READ)

/*read from remote source*/

ReadFile(hpipe,buf,1024,&rbytes,NULL);

WriteFile(hout,buf,rbytes,&wbytes,NULL);

WriteFile(hcache,buf,rbytes,&wbytes,NULL);

else

/*write to remote source*/

ReadFile(hin,buf,1024,&wbytes,NULL);

WriteFile(hcache,buf,wbytes,&rbytes,NULL);

WriteFile(hpipe,buf,wbytes,&rbytes,NULL);

return0;

int main(int argc,void*argv[])

HANDLE hthrd[2];

DWORD tid;

/*create handles*/

hin=GetStdHandle(STD_INPUT_HANDLE);

hout=GetStdHandle(STD_OUTPUT_HANDLE);

/*handles to source,cache*/

hpipe=OpenPipe(argv[1],...,...);

hcache=OpenFile(argv[2],...,...);

/*create threads*/

hthrd[0]=CreateThread(0,0,RWThread,0,0,&tid);

hthrd[1]=CreateThread(0,0,RWThread,1,0,&tid);

WaitForMultipleObjects(2,hthrd,TRUE,INFINITE);

Figure2.Sentinel implementing a null?lter.

ing a log of actions to?ltering the data read from and writ-ten into the data?le.A simple example of such?ltering is a compressed?le.In this case,the sentinel process com-presses and decompresses the?le data as it is written and read.An advantage of this approach over compressed?le systems is that?le compression can be handled on a per-?le basis with different compression algorithms used for dif-ferent types of?les.Additionally,both compression and decompression can be demand-driven and performed incre-mentally.Note that the client application is completely un-aware that it is interacting with a compressed?le.

Filtering can also be used to provide a?le-based inter-face to the Windows system registry,considerably simpli-fying system con?guration.The sentinel checks the reg-istry,providing a simpli?ed version(e.g.,a plain text?le) to the client application.Any modi?cations by the client ap-plication can in turn be parsed by the sentinel process and translated into appropriate registry modi?cations.

An active?le can also combine logging and?ltering ac-tions for concurrent and intelligent logging.Assume that several processes log events using the same log?le.As the

sentinel process

Figure3.Fundamental active?le actions. sentinel receives each log record,it locks the?le,writes the record and unlocks the?le.The processes generating the logs do not need to know about log?le locking.More-over,the sentinel can perform a variety of functions in the background such as cleaning up the logs.Achieving similar functionality with passive?les would require the client ap-plications to essentially embed all of the code and locking protocols for the log managers.

Aggregation The sentinel can aggregate information from various sources,presenting it to client applications as a conventional?le.Examples of these sources include other local or remote?les,databases,network connections, or even other processes.

An example of active-?le based aggregation is seamless access to remote?les that are not accessible via network-mapped shares.The sentinel accesses the remote?le us-ing a standard protocol(e.g.,FTP or HTTP),creates a local copy,and makes the copy available to the client application. The sentinel can also merge multiple remote?les into a sin-gle local?le.From the client’s viewpoint,remote?le ac-cesses are indistinguishable from local ones.Similar trans-parent access to remote?les can also be provided without ever making a local copy.The sentinel directly reads data from and writes data to a network connection.

Aggregation can also be used to dynamically construct ?les containing data from various sources.An example might be an active?le that re?ects the latest stock quotes (downloaded by the sentinel from a server)every time the ?le is opened.Similarly,an inbox?le of an E-mail pro-gram can be such that reading it causes new messages to be retrieved possibly from multiple remote POP servers. Distribution Sentinel processes can also distribute in-formation to various sources,triggered by?le operations against the active?le.As with aggregation,these source include other local or remote?les,databases,network con-nections,and other processes.

An example of active-?le based distribution is an E-mail application where the outbox-?le can be programmed to send email to a particular recipient,every time some data is written to it.This concept can be extended such that the

sentinel process parses the data written to the?le to extract the“To”addresses and send the data to each recipient.

In general,the sentinel process can be used to produce side effects in the active?le’s environment.These side ef-fects can be both synchronous(i.e.,triggered by?le opera-tions)or asynchronous(i.e.,take place in the background).

4.Implementing Active Files

We present four approaches to implement active?08f99541336c1eb91a375d88-mon to all the approaches is the ability to intercept system calls(and especially those for?le system).While this is a general technique known to work for many operating sys-tems,we describe our implementation for Windows NT.

Using interception,we redivert,at runtime,the?le sys-tem API calls initially intended for the Kernel32DLL,to stub functions that implement the features of the active ?les.We use the“Mediating Connectors”toolkit from USC/ISI[2]to perform this interception.The toolkit al-lows simple runtime interception and replacement of se-lected Win32API calls(or calls to any DLL)with calls to another routine.Moreover,interception can be done in a secure fashion such that the application cannot undo it.

The four approaches—process,process-plus-control, DLL-with-thread,and DLL-only—re?ect different parti-tioning of the functionality of active?les between the user and an external process,and represent different tradeoffs be-tween runtime overheads and the convenience of program-ming active?les.We sketch the implementation approaches below,deferring a discussion of how the sentinel is pro-grammed in each case to Section5.Appendix A provides Windows-NT speci?c implementation details.

4.1.Process-based implementation

The process-based implementation approach is the simple and intuitive method,directly re?ecting active?le seman-tics.When a process performs an OpenFile(or CreateFile) operation on an active?le,the call goes to a stub routine, which?rst creates a new process for running the executable associated with the active?le.The stub also passes the cre-ated process the name of the data part of the active?le for use by the sentinel.Then,the stub creates two pipes and at-taches them to the standard input and output of the sentinel process.Finally,the stub stores the handles of the pipes in a structure,returning to the application process a?ctitious handle that points to this structure.

The ReadFile/WriteFile calls are also instrumented.The instrumented calls translate operations on the?ctitious han-dle into reads or writes on the appropriate pipe.

This straightforward implementation approach though convenient to program to,suffers from two problems.First,it can only support a subset of the?le operations.Oper-ations such as ReadFileScatter(or seek in Unix)and Get-FileSize cannot be implemented as there is no method of passing control information between the user process and the sentinel process.The second problem is performance: each?le operation requires two protection-domain cross-ings.These shortcomings are alleviated in the rest of the implementation approaches,albeit at the cost of slightly in-creasing the dif?culty of programming active?les.

4.2.Process-plus-control based implementation This approach solves the problem of handshaking between the user and sentinel processes by adding a control channel in addition to the two pipes.As before,the active?le stub functions handle all interactions with the control channel and the data pipes.The user process continues to interact with an active?le using standard?le operations.

In this approach,all API requests from the application are?rst transmitted to the sentinel process via the control channel and the response of the sentinel process is read from the read pipe.So when the application process wants to read50bytes,a“read50”command is sent to the sentinel, and then50bytes are read from the read pipe.When the application wants to write30bytes,a“write30”command is sent on the control channel and then30bytes are written to the write pipe,which is retrieved by the sentinel process.

Hence,all other?le operations are now passed to the sen-tinel process as commands with arguments.The active?le stubs read the results off the read pipe,and present appro-priately packaged responses(as return codes,or structures) to the user process.A set of headers is provided to the appli-cation programmer to enable easier handling of the control channel when writing the sentinel process.Details of the programming interface are discussed in Section5.Note that the writer of the sentinel process has complete?exibility in deciding when to listen to the control channel.Given dif-ferent models of usage,the sentinel process might choose to eagerly inject data into the read pipe(anticipating read requests from the user)and eagerly wait to read data from the write pipe(anticipating write requests).

Both the process and process-plus-control implementa-tions of active?les are clean and useful.However,as expected,and as veri?ed later in Section6,they suffer from performance problems,being relatively heavyweight due to excessive context switching and data copying.The additional functionality provided by active?les can off-set these performance problems in most cases.However, when performance is an overriding concern,active?les can be made more ef?cient by trading off some of their pro-gramming convenience and trespassing into the protection boundaries of the application process.We describe two such approaches in the remainder of the section.

Process-based Process-plus-control DLL-only

DLL-with-thread

Figure4.Four implementation approaches.

4.3.DLL-with-thread based implementation

Instead of a stand-alone process,this approach encapsulates sentinel fnctionality into a separate DLL,referred to as the sentinel DLL.The initialization routine of this DLL,acti-vated at load time,is the one responsible for orchestrating interactions between the user application,the remote infor-mation services,and the local data part of the active?le.

Opening an active?le“injects”[15]the sentinel DLL as-sociated with the?le into the application and starts a thread for running the orchestration routine.An application write results in the write stub signalling a write to the sentinel thread.The sentinel thread then provides a buffer to the ac-tive?le write routine,which copies data from the user buffer to this target buffer.Reads are handled similarly.While the implementation preserves active?le interface and seman-tics,its architecture is somewhat non-intuitive.

The sentinel process is no longer a process running sep-arate from the application,but just a thread in the applica-tion.There is no inter-process context switching needed–enhancing the performance.File data is not copied from user space to kernel space and then to user space(as is the case with pipes),instead using only one user-level copy.

4.4.DLL-only based implementation

Although the DLL-with-thread approach performs signi?-cantly better than the other two(see Section6),it still has a performance limitation:all?le operations require context switching between the requesting user thread and the sen-tinel thread.The DLL-only implementation approach elim-inates this switch by directly routing?le system API calls to appropriate routines in the sentinel DLL.

The active?le programmer can express any functional-ity in these routines including additional thread and process creation.Although this approach provides the same func-tionality as the other approaches,the programmer needs to be involved in the low-level details of writing the sentinel DLL and cannot take advantage of a clean simple interface.5.Programming the Sentinel Process

We describe below the programming of the sentinel pro-cess for each of the above approaches.As expected,the process-based implementations present active?le imple-menters with a simpler interface than do the DLL-based implementations.One can de?ne a common API that spans across all implementations;however,this complicates the simple programming in the?rst two cases.We are currently exploring automatic translation strategies for taking an ac-tive?le written for a process-based implementation and pro-ducing the DLLs necessary in the DLL-based strategies. 5.1.Process-based implementation

Figure1,described earlier,showed how a sentinel process is written in the process-based active?le implementation. This case uses two threads,one that collects data from in-formation resources and/or the data portion of the active ?le and makes it available to standard output,and another thread that reads in data from standard input and distributes it to the information sources,optionally updating the local cache part.This approach has the advantage that the sen-tinel process can be developed as a standalone executable independent of its interactions with other processes.

5.2.Process-plus-control based implementation The sentinel process in this implementation involves only a single thread,which typically blocks on a read on the con-trol channel.Upon receiving a command from the applica-tion,the thread wakes up and performs the operation(which might entail putting data into the read pipe or taking data off the write pipe).This implementation relies on a description of all the control messages that can be expected on the con-trol channel.Return codes,if any,are passed back along with the data via the read pipe.Again,the formats of such return values must match the formats expected by the appli-cation stubs and are de?ned by our implementation.

5.3.DLL-with-thread based implementation

In this implementation,the sentinel process is no longer a process but a thread,started in a routine called Sen-tinelThrdMain that has to be de?ned in the DLL associated with the active?le.The thread can use three library calls to communicate with the application.The three calls are:

1.AF GetControl–gets control messages sent by the ap-

plication to read/write data or perform other?le oper-ations.These messages might affect the information sources or only make local state changes.

2.AF SendDataToAppl–communicates read responses

or?le information requests to the application.

3.AF GetDataFromAppl–gets client writes.

Similar to the process-plus-control strategy,the thread in the SentinelThrdMain routine runs a dispatch loop using calls to AF GetControl.

5.4.DLL-only based implementation

The programming interface in this case is simply a pass-through,which passes the API calls generated by the appli-cation unmodi?ed to the sentinel.The sentinel is a DLL that replaces Win32?le system calls,with calls programmed by the active?le implementor.This DLL has to provide a set of routines such as OpenFile,CreateFile,ReadFile,and Write-File.These routines are invoked whenever the application accessing the?le calls the corresponding Win32functions. This clearly is the most ef?cient implementation,however places the most burden on the programmer.

6.Overhead of Active Files

This section present the performance of three different im-plementation of active?les,in which the sentinel is a pro-cess in the local machine,an injected thread,or a direct DLL implementation of the Read/Write?le operations.

Sentinel

Figure5.Three critical execution paths.

Figure5shows three different critical paths of execu-tion of an active?le,emulating different caching options, carried out by the sentinel.Path1represents the case of no cache in the sentinel process.Whenever the application issues a Read,a message is sent on the control channel,ask-ing the sentinel to retrieve the data from the remote service. When the data arrives at the sentinel,it pushes it over to the application to satisfy the Read operation.When an applica-tion issues a Write,the buffer is sent directly to the sentinel, which then sends an update message to the remote service.1 Path2represents the case where the data is cached in the active?le on disk.Here,the sentinel interacts with its local ?le rather than contacting the remote service for getting or updating data,driven by application needs.

Path3represents a similar case as the second path,ex-cept that the cache resides in the sentinel’s memory rather than on disk.The sentinel code uses a memory buffer to store application Writes or respond to application Reads. Performance Results We ran the experiments on a clus-ter of PCs(300MHz Pentium II),interconnected with 100Mbps Fast Ethernet.Figure6shows measurements for an application that reads and writes?xed-size blocks from an active?le(we instrumented the application by intercept-ing the open/read/write/close calls and handling them as described before).Our measurements are for a variety of block sizes,and time1000calls of each.

The Read and Write results re?ect the latency and band-width effects of the sentinel respectively.Since an applica-tion is blocked on a read operation,the overhead of inter-action with the sentinel process and any processing therein adds to the latency of the read operation.Since writes are issued without waiting for their completion,any increase in the overhead of a write stems from bandwidth restrictions imposed by the sentinel interaction and processing.

As the graphs show,the different active?le implemen-tations impact the latency and bandwidth of?le operations to different extents.The process-based implementation has the largest impact on latency and bandwidth,while the DLL-only implementation has negligible impact incurring the same costs as if the application were directly accessing the information sources.The thread implementation repre-sents an intermediate point between these extremes,trading off performance for programming convenience.The DLL implementation introduces only a very thin layer of code (injected into the application process);consequently,it in-curs no extra system calls or context switches.2

The process implementation’s overheads stem from the extra buffer copying and process context switching occur-1For clarity,we only describe the unoptimized interaction between the application and the sentinel.The implementations are optimized to im-prove buffer reuse and reduce synchronization overheads.

2The Read operation,normally a system call,is sometimes diverted to a user-mode memcpy()call improving performance over the original.

Process Thread

DLL

|

|

|

|

||

|0.0

|70.0|140.0|210.0|

280.0|

350.0|420.0|490.0|560.0 Read block size (bytes)

8

321285122048

T i m e (μs )

Process Thread

DLL

||||||

|

0.0

|

40.0

|

80.0|120.0|

160.0|

200.0|

240.0

|280.0|320.0 Write block size (bytes)

8

321285122048

T i m e (μs )

(a)Sentinel uses a remote source.

Process Thread

DLL

||||

|

|

|0.0

|90.0|180.0|270.0|360.0|450.0|540.0|630.0|

720.0 Read block size (bytes)

832

128

512

2048

T i m e (μs )

Process

Thread

DLL

||||

||

|

0.0

|

40.0

|

80.0

|

120.0|

160.0|200.0|

240.0

|280.0

|

320.0 Write block size (bytes)

8

321285122048

T i m e (μs )

(b)Sentinel uses a local on-disk cache.

Process

Thread

DLL

|||||||

0.0

|

30.0|60.0|90.0|120.0

|

150.0|180.0|210.0

Read block size (bytes)

8

321285122048

T i m e (μs )

Process

Thread

DLL

|||||||

0.0

|

30.0

|

60.0

|

90.0

|

120.0

|

150.0

|

180.0|

210.0

Write block size (bytes)

8

321285122048

T i m e (μs )

(c)Sentinel uses an in-memory cache.

Figure 6.ReadFile and WriteFile overheads (in s)of different active ?le implementations —process-with-control (Process ),DLL-with-thread (Thread ),and DLL-only (DLL )—for three critical caching paths that involve the (a)network,(b)local disk,and (c)local memory respectively.The baseline costs for directly accessing these paths is indistinguishable from the DLL-only case and is not shown.

ring in the critical 08f99541336c1eb91a375d88pleting the read operation re-quires a thread in the sentinel process to receive the read re-quest,copy the buffer,send a message,and context switch before the application thread can?nish the operation.The results for the Write case are a bit better because data streaming hides some of the latency.However,each Write still requires at least one buffer copy and message receive (and two context switches between processes).

The thread implementation,on a?le operation,lets the application simply switch over to the sentinel thread,which performs the necessary operations without requiring costly interactions across process boundaries.

Note that the above measurements represent only base-line overheads of interacting with the sentinel in different implementations.Since overheads incurred within the sen-tinel are in?uenced by its functionality,it is dif?cult to pre-dict what these costs might be in a particular case.However, our results above show that the active?les framework on its own does not introduce extra cost:the eventual cost of us-ing active?les is determined only by the functionality that they implement,not by the cost of interacting with them.

7.Discussion and Related Work

Active?les permit natural integration of legacy applications into distributed systems by automatically interposing a sen-tinel process between a legacy application written assuming a traditional?le-based interface and one or more remote sources of information.Our mechanism can be viewed in the context of two groups of related work:those efforts whose goal is to allow legacy applications to interface with distributed information sources,and those that rely upon kernel support to extend?le-system functionality. Accessing distributed information sources The tradi-tional approach to using legacy end applications in dis-tributed environments has relied upon the ad hoc use of in-termediary applications that?rst aggregate the data from re-mote sources and?lter it,before passing it on to the legacy application.Although suitable for some applications,a shortcoming of this approach is that the aggregated data is decoupled both from the original sources of information and the end application,which may want to control the aggre-gation process.In contrast,the sentinel process interacts with both the end application and the remote sources,and can both track changes in the original sources as well as be controlled by the demands of the end application.

A more direct approach for accessing remote sources re-lies upon code modi?cation to use a new distributed system-aware API.These APIs exempli?ed by component-based approaches such as OLE[12],COM[16],DCOM[8],and CORBA[14],allow remote sources to be accessed through well-de?ned interfaces:the interface permits actions in ad-dition to just accessing or updating the data.Despite their generality,these frameworks have seen restricted use be-cause of differences in semantics,complicated APIs,and heavyweight implementations.One contribution of the ac-tive?les mechanism is to encapsulate use of these frame-works within a sentinel process,which can optionally cache some of the accesses while presenting the client with an in-tuitive and familiar?le-based interface.Such use has the potential of effectively addressing two of the shortcomings of component-based approaches:ease of use and ef?ciency.

In some cases,access to remote sources of information requires server-side processing,exempli?ed by approaches such as dynamic HTML,cgi-bin scripts,and servlets[5]. Active?les can easily emulate such behavior with the sen-tinel process either creating a remote process on demand, or talking to an existing service.Moreover,active?les can support applications not possible with these approaches such as aggregating content from various sites. Extending?le-system functionality The mediating be-havior of the sentinel process can be thought of as gener-alizing existing operating system structures such as named pipes,?le-based interfaces to devices(e.g.,/dev in Unix), and?le-based interfaces for process management(e.g., /proc in Solaris).Active?les are similar to these structures in providing a connection-oriented service between client and server processes,but differ in their ability to support the complete?le-access API by virtue of the presence of the control channel.The latter feature makes them completely indistinguishable from normal?les,facilitating their use in many diverse scenarios.Moreover,instead of requiring that client-server interactions always cross protection domains, Active?les admit a spectrum of implementations that trade-off functionality versus runtime overhead.

Several approaches for extending?le-system function-ality have been proposed.Most of these approaches rely on kernel modi?cations that enable the stacking of vnodes and templates as in the SunOS[17],Alex[4],and Fi-cus[10]?le systems,or noti?cation about?le access as in the Watchdogs[3]approach.Extensible operating sys-tems such as SPIN[7],Exokernel[6],and VINO[18]rely on more aggressive modi?cations to permit introduction of user-speci?ed kernel functionality.Although these mecha-nisms provide the necessary hooks to implement active?le-like functionality,they are either too heavyweight or are in-applicable on commodity operating systems.In contrast, active?les can be implemented ef?ciently at the user level, permitting their use for a much larger set of applications.

Some recent efforts such as Janus[9]and Ufo[1]have used API or system-call interception to extend?le-system functionality without requiring kernel modi?cation.Janus restricts the set of?les a process can access and Ufo pro-vides seamless access to remote?les.In contrast to the hard-coded functionality of these approaches,active?les are completely programmable,enabling the expression of

these and other general per-?le behaviors in a simple and uniform fashion.Moreover,unlike both these systems that implement process-centric control,active?les enable resource-centric control:the?le itself can specify the kind of access control policies that need be implemented as well as custom aggregation and caching behaviors.

8.Conclusions

Active?les are a new extensible and programmable concept for transparently associating actions with local data?les. Because active?les use a familiar?le interface for read-ing and writing data and can be easily programmed with user-speci?ed functionality,they enable the natural integra-tion of legacy applications into distributed environments by making it possible for such applications to seamlessly ac-cess remote services.We have described four user-level approaches for implementing active?les in the Windows NT operating system.These approaches possess different ef?ciency and programming characteristics,with the most ef?cient implementation incurring negligible overheads as compared to direct application-level operations. References

[1] A. D.Alexandrov,M.Ibel,K.E.Schauser,and C.J.

Scheiman.Ufo:A personal global?le system based on user-level extensions to the operating system.ACM Transactions on Computer Systems,16(3):207–233,Aug.1998.

[2]R.Balzer.Mediating connectors2.0,1998.

08f99541336c1eb91a375d88/software-sciences/

safe-execution-environments.

[3] B.N.Bershad and C.B.Pinkerton.Watchdogs:Extend-

ing the UNIX File System.In Proc.1988Winter USENIX Conference,pages267–275,1988.

[4]V.Cate.Alex-A global?08f99541336c1eb91a375d88ENIX File

Systems Workshop,pages1–11,May1992.

[5]WWW Consortium.http//08f99541336c1eb91a375d88.

[6] D.R.Engler et al.Exokernel:an operating system archi-

tecture for application-level resource management.In Proc.

15th Symp.on Operating Systems Principles,1995.

[7] B.N.Bershad et al.Extensibility,safety and performance in

the SPIN operating system.In Proc.15th Symp.on Operat-ing Systems Principles,1995.

[8]R.I.Frank E.DCOM:Microsoft Distributed Component

Object Model.IDG Books Worldwide,1997.

[9]I.Goldberg,D.Wagner,R.Thomas,and E.A.Brewer.A se-

cure environment for untrusted helper applications.In Proc.

6th Usenix Security Symposium,July1996.

[10]R.G.Guy et al.Implementation of the Ficus replicated?le

system.In Proc.1990Summer USENIX Conf.,June1990.

[11]G.Hunt and D.Brubacher.Detours:Binary interception of

Win32functions.In Proc.3rd USENIX Windows NT Sym-posium,July1999.

[12]Microsoft OLE DB2.0Programmer’s Reference and Data

Access SDK.Microsoft Press,1998.[13] B.C.Neuman.The Prospero File System:A global?le sys-

tem based on the virtual system 08f99541336c1eb91a375d88puting Systems, 5(4):407–432,Fall1992.

[14]Object Management Group.08f99541336c1eb91a375d88.

[15]J.Richter.Advanced Windows(3rd Edition).Microsoft

Press,1997.

[16] D.Rogerson.Inside COM.Microsoft Press,1997.

[17] D.S.H.Rosenthal.Evolving the Vnode interface.In Proc.

1990Summer USENIX Conference,1990.

[18]M.Seltzer,Y.Endo,C.Small,and K.Smit.An introduc-

tion to the VINO architecture.Technical Report TR-34-94, Harvard University,Cambridge,MA,1994.

A.Active Files on Windows NT

Active?les have an active part and a passive part.Both are saved as(passive)?les,relying on NTFS streams capability to package them as a single data?le,which exhibits com-patible behavior for standard?le operations such as copying and renaming.The active part is either an executable(in the process-based approaches)or a DLL(in the DLL-based ap-proaches),while the passive part is a data?le.Note that as discussed earlier the data?le can be empty.

Whenever an instrumented application tries to open an active?le,the active part of the?le is executed as a process, or injected as a DLL into the application.To implement this action,we leverage the fact that in Windows NT,the operating system’s functionality is found in DLLs.There-fore,Win32applications are complied with“loose links”, with address resolution of API functions done at loading time.At compile time,the linker constructs an import ad-dress table(IAT)for the process,which becomes the target for all API calls.At load time,the appropriate DLLs con-taining the implementation of these APIs are loaded and the addresses in the IAT table are resolved[2,11,15].

We manipulate the import table of a running process,so that it can use active?les.The idea is that all?le operations (that reside in kernel32.dll)can now pass through our in-jected DLL,which implements the connection between the client application and the active part of the active?les;all without the client application being designed(or prepared) for it.We use the Mediating Connectors[2]toolkit to pro-vide the DLL injection and API interception functions[15].

Speci?cally,the client application,when executing?le operations will call the corresponding Win32API-functions (e.g.OpenFile,ReadFile,WriteFile,etc.).These calls are diverted to active?le implementation routines.

A.1.Implementation details

Our implementation consists of two parts,the code to be put in the application and the code to be put in the sentinel.The code that needs to become part of the application is put in a DLL and injected into the application.Note that this code

is part of the active?le implementation:it is different from the code contained in the end-user application for using the active?le,which accesses the active?le as it would an or-dinary(passive)?le.Moreover,the application-side code is independent of the active?le functionality,that is,all active ?les share this code.The sentinel code also has two parts, a part that is written by the programmer of the active?le (the actual logic of the sentinel process)and the code that is supplied by the active?le implementation(in the form of headers and library calls.)Again,the headers and library calls are independent of the code written by the program-mer.The combination of the programmer-written code and the supplied code is called the active?le sentinel code. A.2.Process-Based Implementations

In both process-based implementations,the application-side code is quite straightforward.It is a set of stubs,one for each instrumented API call.For example,the stub for OpenFile()(or CreateFile)checks to see if the?le name cor-responds to an active?le or not(by checking the extension). If the?le is not an active?le,the stub calls the standard Win32OpenFile routine.If the?le is an active?le,either two(in the simple process based approach)or three(in the process-plus-control based approach)anonymous pipes are created.These pipe handles are duplicated using the Dupli-cateHandle function.Next,a new process running the exe-cutable associated with the active?le is started and passed the duplicate pipe handles.Finally,a dummy handle is ac-quired and supplied as the return?le handle to the process that initiated the OpenFile call.An association is also made between the dummy handle and the two or three pipe han-dles associated with the active?le.The CloseHandle call just shuts down the created pipes.

The application-side code for the two process-based im-plementations differs slightly in how it handles?le oper-ations on an open?le.In the simple process-based imple-mentation,whenever the application calls ReadFile on some ?le handle,our stub gets control.The stub checks if this ReadFile is against the dummy handle we created.If not, we pass it to the?le system.If yes,then we read the re-quested amount of bytes from the read pipe using the actual pipe handle.The WriteFile operation is implemented sim-ilarly.Operations such as ReadFileScatter that do not have direct correspondence with operations on pipes are simply dropped(with an appropriate return code)by the client stub.

The only difference in the application stub for the process-plus-control based implementation is that it writes the command into the control pipe prior to reading or writ-ing from the data pipes.These commands are picked up by the sentinel process,which either injects data or ex-tracts data as required at the other end.Note that the con-trol channel allows the active?le programmer to provide application-speci?c functionality even for calls that do not have corresponding pipe operations.These calls are just passed on to the sentinel process,and potentially in?uence data put into the data pipes.This implementation provides the server code with header?les containing information about the types of control messages supported on the con-trol pipe,and library stubs to interpret these messages.

A.3.DLL Based Implementations

Instead of having the sentinel run as an external process,the two DLL-based implementations fold the sentinel into the application process.The sentinel code is injected into the process as part of the DLL that in previous implementations contained only the client stubs.The DLL-with-thread based implementation uses library routines to simulate the data transfer and synchronization.There are six routines,three on the client side and three on the server side:

1.AF SendControl:This routine sends a control mes-

sage from the client to the server.The routine does not block.

2.AF GetControl:This is used in the server to get a con-

trol message.Blocks till such a message is received.

Of course,these“messages”are implemented using events and shared memory.

3.AF SendDataToSentinel:A client sends data to the

server(on writes)using this routine.As before,the data is passed using a shared memory buffer.

4.AF GetDataFromAppl:Provides complementary

functionality to AF SendDataToSentinel:Blocks until data is received.

5.AF SendDataToAppl:Similar to(3),but in the other

direction.

6.AF GetDataFromSentinel:Similar to(5),but at the

client end.

The OpenFile API call,instead of running the executable as in the process-based implementations,loads the DLL containing the sentinel and starts a thread that runs a routine called SentinelThrdMain.SentinelThrdMain is the replace-ment for the main()routine used in the earlier approaches. Reading,writing and other control commands call the above library routines as appropriate.

The DLL-only based implementation is actually even simpler.The implementation just substitutes calls to special routines contained in the sentinel DLL(these routines must be named AF ReadFile,AF WriteFile,and AF Control), whenever the corresponding?le operations are called by the user program.Note that this approach requires the imple-menter of the sentinel to handle all of the synchronization and data management.

本文来源:https://www.bwwdw.com/article/6mte.html

Top