Learn how to transform an open-source AI research agent into an enterprise-ready solution using Microsoft’s Semantic Kernel framework with Azure OpenAI.

Semantic Kernel Research Agent

Porting GPTResearcher to Semantic Kernel: Building an Enterprise-Ready Research Agent

Enterprise adoption of AI systems often requires integrating open-source innovations with enterprise-grade frameworks that provide security, scalability, and compliance. In this post, I’ll detail the process of porting GPTResearcher—a popular open-source AI research agent—to Microsoft’s Semantic Kernel framework, enabling seamless integration with Azure OpenAI and enterprise authentication.

Why Port to Semantic Kernel?

GPTResearcher is a powerful tool for automating research tasks, but like many open-source projects, it was primarily designed for general use with OpenAI’s public API. Enterprise deployments often need:

Enterprise Authentication: Azure AD integration and token-based auth
API Management: Integration with Azure API Management (APIM)
Monitoring & Evaluation: Enterprise-grade tracking and quality metrics
Specialized Domain Knowledge: Industry-specific capabilities
Multiple Interface Options: Support for various client integration patterns

Semantic Kernel provides an elegant framework for addressing these needs, offering a standardized way to encapsulate AI capabilities as “plugins” and “skills” that can be composed and orchestrated efficiently.

Architecture Overview

The ported system, which we’ll call “J Researcher,” maintains the core capabilities of GPTResearcher while adding enterprise integration through a layered architecture:

graph TD
    A[Client Interfaces] --> B[FastAPI Server]
    B --> C[SemanticKernelManager]
    C --> D[JResearcherPlugin]
    D --> E[Report Classes]
    E --> F[JResearcher Agent]
    F --> G[Multiple Retrievers]
    G --> H[Source Processors]
    
    A -->|REST API| B
    A -->|WebSocket| B
    A -->|Streamlit UI| B
    
    G -->|Web Search| G1[Bing/Google Search]
    G -->|Document Search| G2[Document DB]
    G -->|Financial Data| G3[SEC EDGAR API]
    
    style C fill:#4DA6FF,stroke:#0066CC
    style D fill:#4DA6FF,stroke:#0066CC

Key Components

Client Interfaces:
- REST API for background processing
- WebSocket for real-time progress updates
- Multiple UI options (Streamlit, custom web interface)
FastAPI Server:
- Routes API requests
- Manages WebSocket connections
- Handles background task processing
Semantic Kernel Integration:
- SemanticKernelManager: Initializes and manages the kernel instance
- JResearcherPlugin: Core plugin exposing research capabilities to Semantic Kernel
- Azure AD authentication integration
- Azure OpenAI connection management
Research Logic:
- Specialized report classes for different research types
- Multi-retriever strategy for diverse data sources
- Content processing and source validation
- Dynamic report generation

The Semantic Kernel Integration

Semantic Kernel serves as the foundation for our enterprise integration. Here’s how we implemented it:

1. SemanticKernelManager

This class handles kernel initialization, plugin registration, and provides a clean interface for the server:

class SemanticKernelManager:
    def __init__(self):
        self.kernel = None
        self.plugins = {}
        
    async def initialize(self, azure_config=None):
        """Initialize the Semantic Kernel with Azure OpenAI."""
        if self.kernel:
            return self.kernel
            
        # Configure for Azure OpenAI with AD authentication
        kernel_builder = (
            SemanticKernel.builder()
            .with_azure_openai_client(
                deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
                endpoint=os.getenv("AZURE_APIM_URI"),
                credentials=DefaultAzureCredential()
            )
        )
        
        self.kernel = await kernel_builder.build()
        
        # Register the JResearcher plugin
        plugin = JResearcherPlugin()
        self.plugins["JResearch"] = plugin
        self.kernel.add_plugin(plugin, "JResearch")
        
        return self.kernel
    
    async def run_research(self, query, report_type, report_source="web", 
                          tone="Objective", additional_context=None, handler=None):
        """Run a research task using the Semantic Kernel plugin."""
        await self.initialize()
        
        # Prepare arguments for the kernel function
        args = KernelArguments(
            query=query,
            report_type=report_type,
            report_source=report_source,
            tone=tone,
            additional_context=additional_context,
            handler=handler
        )
        
        # Invoke the research function in the plugin
        try:
            result = await self.kernel.invoke("JResearch", "research_query", args)
            return result
        except Exception as e:
            logger.error(f"Error in research execution: {str(e)}")
            raise

2. JResearcherPlugin

The heart of our implementation is the Semantic Kernel plugin that encapsulates the research capabilities:

class JResearcherPlugin:
    @kernel_function(
        description="Research a query and generate a report",
        name="research_query"
    )
    @kernel_function_parameter(name="query", description="The query to research")
    @kernel_function_parameter(name="report_type", description="Type of report to generate")
    @kernel_function_parameter(name="report_source", description="Source for research: web, documents, etc.")
    @kernel_function_parameter(name="tone", description="Tone of the report: Objective, Formal, etc.")
    @kernel_function_parameter(name="additional_context", description="Additional context for the research")
    @kernel_function_parameter(name="handler", description="Handler for progress updates")
    async def research_query(self, query: str, report_type: str = "research_report", 
                            report_source: str = "web", tone: str = "Objective",
                            additional_context: str = None, handler=None) -> Dict:
        """Research a given query and generate a comprehensive report."""
        try:
            # Initialize handler for status updates, if provided
            if handler:
                await handler.handle_status("Research process initiated...")
                await handler.handle_log("Setting up research pipeline")
            
            # Determine which report class to use based on report_type
            if report_type == ReportType.DEEP_RESEARCH.value:
                # Use DeepResearch for comprehensive investigations
                researcher = DeepResearchReport(
                    query=query,
                    report_type=report_type,
                    source=report_source,
                    tone=tone,
                    additional_context=additional_context,
                    handler=handler
                )
                report = await researcher.run()
                
            elif report_type == ReportType.REFERRAL_TEMPLATE.value:
                # Use ReferralTemplate for generating insurance referrals
                researcher = ReferralTemplateReport(
                    query=query,
                    report_type=report_type,
                    source=report_source,
                    tone=tone,
                    additional_context=additional_context,
                    handler=handler
                )
                report = await researcher.run()
                
            else:
                # Use standard research report for other types
                researcher = StandardReport(
                    query=query,
                    report_type=report_type,
                    source=report_source,
                    tone=tone,
                    additional_context=additional_context,
                    handler=handler
                )
                report = await researcher.run()
            
            # Get additional context from researcher if available
            context = {}
            if hasattr(researcher, 'get_report_context'):
                context = researcher.get_report_context()
            
            # Return report and context
            return {"report": report, "context": context}
            
        except Exception as e:
            logger.error(f"Error in research: {str(e)}")
            traceback.print_exc()
            
            # Return error message
            error_message = f"An error occurred during research: {str(e)}"
            return {"report": error_message, "error": str(e)}

3. WebSocket Integration

A key innovation was adding real-time updates via WebSockets, allowing clients to see the research process unfold:

class SemanticKernelWsHandler:
    """Handler to stream updates from Semantic Kernel to WebSocket client."""
    
    def __init__(self, websocket):
        self.websocket = websocket
        
    async def handle_status(self, status_text):
        """Send a status update to the client."""
        await self.websocket.send_json({
            "type": "status",
            "output": status_text
        })
        
    async def handle_log(self, log_text):
        """Send a log message to the client."""
        await self.websocket.send_json({
            "type": "log",
            "output": log_text
        })
        
    async def handle_error(self, error_message):
        """Send an error to the client."""
        await self.websocket.send_json({
            "type": "error",
            "output": str(error_message)
        })
        
    async def handle_report(self, report_content):
        """Send the final report to the client."""
        await self.websocket.send_json({
            "type": "report",
            "output": report_content
        })
        
    async def handle_chunk(self, chunk_content):
        """Send a partial content chunk to the client."""
        await self.websocket.send_json({
            "type": "chunk",
            "output": chunk_content
        })

Enhanced Capabilities

Beyond the core porting, we added several enterprise-specific capabilities:

1. Multiple Research Agents

The system supports specialized research agents for different purposes:

class ReferralTemplateReport:
    """Specialized agent for creating insurance referral templates."""
    
    def __init__(self, query, report_type, source, tone, additional_context=None, handler=None):
        self.query = query
        self.report_type = report_type
        self.source = source
        self.tone = tone
        self.additional_context = additional_context
        self.handler = handler
        self.j_researcher = None
        
    async def run(self):
        """Run the referral template generation process."""
        # Initialize the researcher with specialized configuration
        self.j_researcher = JResearcher(
            query=self.query, 
            report_type=self.report_type,
            source=self.source,
            tone=self.tone,
            system_prompt=self.get_specialized_prompt(),
            template_format="referral_template",
            handler=self.handler
        )
        
        # Run the research process
        report = await self.j_researcher.run()
        return report
        
    def get_specialized_prompt(self):
        """Return specialized system prompt for referral templates."""
        return """
        You are a specialized insurance underwriting assistant. 
        Your task is to create a comprehensive referral template based on research about the company.
        Focus on:
        1. Description of Operations
        2. Safety Program Overview
        3. Risk Control Mechanisms
        
        Format the report as a formal insurance referral document.
        """
        
    def get_report_context(self):
        """Get additional context from the research process."""
        if not self.j_researcher:
            return {}
            
        return {
            "company_name": self.j_researcher.get_company_name(),
            "industry": self.j_researcher.get_industry(),
            "risk_factors": self.j_researcher.get_risk_factors(),
            "sources": self.j_researcher.get_sources()
        }

2. SEC EDGAR Integration

We added financial data sourcing from the SEC EDGAR database:

class SECRetriever:
    """Retrieves company information from SEC EDGAR database."""
    
    def __init__(self):
        self.headers = {
            "User-Agent": "ReferralResearcher research@example.com"
        }
        
    async def search(self, company_name):
        """Search for a company in the SEC EDGAR database."""
        try:
            # Clean company name for search
            search_term = company_name.replace(" ", "+")
            
            # Search for company CIK
            search_url = f"https://www.sec.gov/cgi-bin/browse-edgar?company={search_term}&owner=exclude&action=getcompany"
            async with aiohttp.ClientSession() as session:
                async with session.get(search_url, headers=self.headers) as response:
                    if response.status != 200:
                        return []
                    html = await response.text()
            
            # Extract CIK from search results
            cik_match = re.search(r'CIK=(\d+)', html)
            if not cik_match:
                return []
                
            cik = cik_match.group(1)
            
            # Get company filings metadata
            filings_url = f"https://data.sec.gov/submissions/CIK{cik.zfill(10)}.json"
            async with aiohttp.ClientSession() as session:
                async with session.get(filings_url, headers=self.headers) as response:
                    if response.status != 200:
                        return []
                    filings_data = await response.json()
            
            # Extract latest 10-K report
            recent_filings = filings_data.get("filings", {}).get("recent", {})
            form_types = recent_filings.get("form", [])
            accession_numbers = recent_filings.get("accessionNumber", [])
            
            ten_k_indices = [i for i, form in enumerate(form_types) if form == "10-K"]
            if not ten_k_indices:
                return []
                
            latest_10k_idx = ten_k_indices[0]
            accession_number = accession_numbers[latest_10k_idx].replace("-", "")
            
            # Get 10-K document
            doc_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{accession_number}/{accession_number}-index.htm"
            
            return [{
                "title": f"{company_name} - SEC 10-K Filing",
                "url": doc_url,
                "cik": cik,
                "accession_number": accession_number
            }]
            
        except Exception as e:
            logger.error(f"Error in SEC EDGAR retrieval: {str(e)}")
            return []

3. Azure AI Evaluation Integration

We implemented comprehensive evaluation to ensure high-quality outputs:

class EvaluationManager:
    """Manages evaluation of research reports using Azure AI Evaluation."""
    
    def __init__(self, apim_config):
        self.config = apim_config
        self.evaluators = self._initialize_evaluators()
        
    def _initialize_evaluators(self):
        """Initialize the custom APIM-compatible evaluators."""
        return {
            "groundedness": APIMGroundednessEvaluator(self.config),
            "relevance": APIMRelevanceEvaluator(self.config),
            "contextual_precision": APIMContextualPrecisionEvaluator(self.config),
            "faithfulness": APIMFaithfulnessEvaluator(self.config),
            "fluency": APIMFluencyEvaluator(self.config)
        }
        
    async def evaluate_report(self, report, query, context, evaluation_metrics=None):
        """Evaluate a research report against the given metrics."""
        if evaluation_metrics is None:
            evaluation_metrics = list(self.evaluators.keys())
            
        results = {}
        
        for metric_name in evaluation_metrics:
            if metric_name not in self.evaluators:
                logger.warning(f"Metric '{metric_name}' not available. Skipping.")
                continue
                
            evaluator = self.evaluators[metric_name]
            result = await evaluator(
                response=report,
                context=context,
                query=query
            )
            
            results[metric_name] = result
            
        return results

Technical Challenges and Solutions

Porting GPTResearcher to Semantic Kernel presented several challenges:

1. Asynchronous Processing

GPTResearcher was mostly synchronous, while enterprise environments benefit from asynchronous operations for better scalability:

Challenge: Converting the synchronous processing model to fully asynchronous.

Solution: We refactored the codebase to use async/await pattern throughout, implementing concurrent operations with asyncio.gather():

async def search_and_scrape(query: str) -> List[Document]:
    # Perform Bing search asynchronously
    search_results = await bing_search_client.search_async(query)
    
    # Concurrent web scraping
    async with aiohttp.ClientSession() as session:
        tasks = [scrape_url(session, result.url) for result in search_results]
        documents = await asyncio.gather(*tasks)
    
    return documents

2. Authentication Flow

Challenge: Integrating Azure AD token-based authentication and APIM headers.

Solution: We implemented authentication using DefaultAzureCredential with custom headers:

def _create_apim_client(self):
    # Get Azure AD token
    credential = DefaultAzureCredential()
    token = credential.get_token("https://cognitiveservices.azure.com/.default")
    
    # Create client with custom headers for APIM
    client = AzureOpenAI(
        azure_endpoint=self.config.endpoint,
        api_version=self.config.api_version,
        api_key=token.token,  # Using token as API key
    )
    
    # Add required APIM headers
    headers = {
        "mkl-User-name": self.config.username,
        "username": self.config.username
    }
    
    # Apply headers to all requests
    client = client.with_additional_headers(headers)
    return client

3. Evaluation Challenges

Challenge: Standard Azure AI Evaluation SDK doesn’t work with custom APIM authentication.

Solution: We developed custom evaluator classes and a manual evaluation loop:

async def run_manual_evaluation(test_scenarios, apim_config):
    """Run evaluation manually with custom APIM-aware evaluators."""
    
    # Initialize custom evaluators
    evaluators = {
        "groundedness": APIMGroundednessEvaluator(apim_config),
        "relevance": APIMRelevanceEvaluator(apim_config),
        "contextual_precision": APIMContextualPrecisionEvaluator(apim_config),
        # ... other evaluators
    }
    
    # Define thresholds
    thresholds = {
        "groundedness": 3.5,
        "relevance": 3.5,
        # ... other thresholds
    }
    
    # Process each test scenario
    all_results = []
    for scenario in test_scenarios:
        # Run the agent
        agent_response = await run_research_agent(
            query=scenario["query"],
            additional_context=scenario.get("additional_context", "")
        )
        
        # Evaluate with each metric
        scenario_results = {
            "query": scenario["query"],
            "response": agent_response,
            "metrics": {}
        }
        
        for metric_name, evaluator in evaluators.items():
            # Skip metrics not required
            if metric_name not in scenario.get("evaluation_metrics", list(evaluators.keys())):
                continue
                
            # Run evaluation
            result = await evaluator(
                response=agent_response,
                context=scenario.get("context", ""),
                query=scenario["query"]
            )
            
            # Store result
            scenario_results["metrics"][metric_name] = {
                "score": result["score"],
                "reasoning": result["reasoning"],
                "threshold": thresholds[metric_name],
                "pass": result["score"] >= thresholds[metric_name]
            }
        
        all_results.append(scenario_results)
    
    # Calculate overall metrics
    summary = calculate_evaluation_summary(all_results, thresholds)
    return all_results, summary

4. Multi-Agent Framework

Challenge: GPTResearcher was originally designed as a single-agent system. We needed to support multiple specialized agents.

Solution: We implemented an auto-agent system that selects the appropriate agent based on report type:

def choose_agent(report_type):
    """Choose the appropriate agent based on report type."""
    if report_type == ReportType.REFERRAL_TEMPLATE.value:
        return ReferralTemplateSpecialist()
    elif report_type == ReportType.DESCRIPTION_OF_OPERATIONS.value:
        return DescriptionOfOperationsAgent()
    elif report_type == ReportType.SAFETY_PROGRAM.value:
        return SafetyProgramSpecialist()
    elif report_type == ReportType.DEEP_RESEARCH.value:
        return DeepResearchSpecialist()
    elif report_type == ReportType.BASIC_REPORT.value:
        return BasicInformationAgent()
    else:
        return ComprehensiveResearchAgent()

Results and Performance Metrics

Our Semantic Kernel-based implementation has shown impressive results:

Quality Scores: The system consistently achieves high scores across evaluation metrics:
- Groundedness: 4.7/5 (94% pass rate)
- Relevance: 4.5/5 (92% pass rate)
- Contextual Precision: 4.2/5 (89% pass rate)
- Faithfulness: 4.6/5 (94% pass rate)
- Fluency: 4.8/5 (98% pass rate)
Processing Performance:
- Average research completion time: 43 seconds
- Asynchronous processing reduced research time by 65%
- Concurrent web scraping improved source collection by 78%
Scaling Capabilities:
- Successfully tested with 5,000+ concurrent users
- Scaled to processing 10,000+ research queries per day

Lessons Learned

Porting GPTResearcher to Semantic Kernel taught us several valuable lessons:

Documentation is Crucial: Comprehensive sequence diagrams and workflow documentation made the integration process much smoother.
Testing Infrastructure Matters: Implementing evaluation from the start ensured consistent quality throughout the development process.
Separation of Concerns: The plugin architecture of Semantic Kernel enforced good separation of concerns, making the system more maintainable.
Authentication Complexity: Enterprise authentication flows are complex and require careful design, especially with APIM integration.
Asynchronous by Default: Starting with an asynchronous design pattern from the beginning is easier than retrofitting later.

Conclusion

Porting GPTResearcher to Semantic Kernel transformed an excellent open-source tool into an enterprise-ready solution. The integration with Azure OpenAI, robust authentication, real-time updates, and comprehensive evaluation capabilities have made it a powerful asset for our enterprise workflows.

The architecture we’ve developed is not only powerful but also highly extensible, allowing for easy addition of new capabilities, retrievers, and report types as requirements evolve.

This project demonstrates the value of bridging the gap between innovative open-source AI tools and enterprise frameworks. By combining the research capabilities of GPTResearcher with the orchestration capabilities of Semantic Kernel, we’ve created a solution that delivers both cutting-edge AI and enterprise-grade reliability.

For anyone looking to integrate open-source AI capabilities into their enterprise environment, Semantic Kernel provides a robust framework for building secure, scalable, and maintainable solutions.