In [None]:

import pandas as pd

df = pd.read_csv("stanford_courses_cleaned_non_generic.csv", dtype=str)

1-shot generation of course outlines from the title and description

In [None]:
from string import Template

OUTLINE_TEMPLATE = Template("""Write a course outline for a textbook on \"The Global Positioning System: Where on Earth are We, and What Time is It?\" covering the following topics: \"Why people want to know where they are: answers include cross-Pacific trips of Polynesians, missile guidance, and distraught callers. How people determine where they are: navigation technology from dead-reckoning, sextants, and satellite navigation (GPS). Hands-on experience. How GPS works; when it does not work; possibilities for improving performance.\".
Model: 1. Introduction
- What is the Global Positioning System?
- Importance of GPS
- Overview of the course

2. Navigation technology
- Dead-reckoning
- Sextants
- Satellite navigation
- Comparison of technologies
- Hands-on experience with navigation technology

3. GPS technology
- How GPS works
    - Satellites
    - Ground receivers
    - Triangulation
- When GPS does not work
    - Blockage
    - Multipath
- Possibilities for improving performance

4. Applications of GPS
- Cross-Pacific trips of Polynesians
- Missile guidance
- Distraught callers
- Other applications of GPS

User: Write a course outline for a textbook on \"${COURSE_TITLE}\" covering the following topics: \"${COURSE_DESCRIPTION}\". Do not include assignments, exams or prerequisites.
Model: """)

In [None]:
courses_to_generate = []
for a, b in df.iterrows():
    prompt = OUTLINE_TEMPLATE.substitute({"COURSE_TITLE": b["title"], "COURSE_DESCRIPTION": b["description"]})
    courses_to_generate.append({
        "course_title": b["title"],
        "course_description": b["description"],
        "prompt": prompt,
    })

In [None]:
generations = [...]  # code to generate using the prompts here

In [None]:
for course, generation in zip(courses_to_generate, generations):
    course["outline"] = generation

pd.DataFrame(courses_to_generate).to_csv("outlines_full.csv")

(very large) 2-shot prompt to have the model correct and clean up the generated outlines

In [None]:

OUTLINE_FILTER_TEMPLATE = Template("""The following is a course outline for a course on \"Anesthesia Operating Room Clerkship\". This outline needs to be anonymized and adapted to an online audience:
1.1 Introduction: Overview of the Anesthesia Operating Room Clerkship
1.2 Introduction: Objectives of the clerkship
1.3 Introduction: Prerequisites for the clerkship
2.1 Clinical settings: Sequoia Hospital in Redwood City
2.2 Clinical settings: Outpatient surgery centers throughout the community
2.3 Clinical settings: Exposure to general and regional anesthetic techniques
2.4 Clinical settings: Adult and pediatric patients
3.1 Personalized discussion: Applied physiology
3.2 Personalized discussion: Pharmacology
3.3 Personalized discussion: Pathophysiology of the surgical patient
3.4 Personalized discussion: Daily basis
3.5 Personalized discussion: Final paper to be submitted by the students
4.1 Transportation: Students need to arrange transportation to the various workplaces
5.1 Prerequisites: A major clerkship in medicine or surgery is strongly recommended
6.1 Periods available: 1-12, full-time for 2 weeks
6.2 Periods available: 1 student per period
7.1 Clerkship director and coordinator: Kurt Fink, M.D.
7.2 Clerkship director and coordinator: Yun Tao, 650-724-1706, yuntao@stanford.edu, Stanford Hospital
8.1 Reporting instructions: Contact Dr. Kurt Fink one week prior
8.2 Reporting instructions: Time: TBA
8.3 Reporting instructions: Call code: 0
9.1 Other faculty: Palo Alto Medical Clinic Anesthesiologist
10.1 Location: Palo Alto Medical Foundation.

Which of the sections of the outline contain: 
- private faculty members information (names or contact information)
- prerequisites, requirements, application processes or other practical course information not related to the course content
- assignments, final papers, exams, presentations or other student evaluation information
Falcon:
- private faculty members information (names or contact information): 7.1, 7.2., 8.1, 9.1
- prerequisites, requirements, application processes, schedules or other practical course information not related to the course content: 1.3, 4.1, 5.1, 6.1, 6.2, 8.1, 8.2, 8.3, 10.1
- assignments, final papers, exams, presentations or other student evaluation information: 3.5
User: The following is a course outline for a course on "Numerical Methods for Compressible Flows". This outline needs to be anonymized and adapted to an online audience:
1.1 Introduction: Overview of the course
1.2 Introduction: Importance of numerical methods for compressible flows
1.3 Introduction: Prerequisites for the course
2.1 Mathematical models for compressible flows: Hierarchy of mathematical models
2.2 Mathematical models for compressible flows: Ideal potential flow
2.3 Mathematical models for compressible flows: Transonic potential flow
3.1 Numerical methods for compressible flows: Finite difference methods
3.2 Numerical methods for compressible flows: Finite volume methods
3.3 Numerical methods for compressible flows: Finite element methods
4.1 Representative model problems: Shocks
4.2 Representative model problems: Expansions
5.1 Treatment of boundary conditions: Dirichlet boundary conditions
5.2 Treatment of boundary conditions: Neumann boundary conditions
6.1 Applications of numerical methods for compressible flows: Aerospace engineering
6.3 Applications of numerical methods for compressible flows: Other applications of numerical methods for compressible flows

Which of the sections of the outline contain: 
- private faculty members information (names or contact information)
- prerequisites, requirements, application processes or other practical course information not related to the course content
- assignments, final papers, exams, presentations or other student evaluation information
Falcon: 
- private faculty members information (names or contact information): None
- prerequisites, requirements, application processes, schedules or other practical course information not related to the course content: 1.3
- assignments, final papers, exams, presentations or other student evaluation information: None
User: The following is a course outline for a course on \"${COURSE_TITLE}\". This outline needs to be anonymized and adapted to an online audience:
${SECTIONS_LIST}

Which of the sections of the outline contain: 
- private faculty members information (names or contact information)
- prerequisites, requirements, application processes, schedules or other practical course information not related to the course content
- assignments, final papers, exams, presentations or other student evaluation information
Falcon: """)

Reformat cells into numbered format

In [None]:
import re

FIND_SECTIONS_REGEX = re.compile(r"\d\. .*(?:\n\s*- .*)+")
FIND_TITLES_REGEX = re.compile(r"\d\. (.*)")
FIND_UNIT_TITLES_REGEX = re.compile(r"\n\s*- (.*)")

def extract_sections(outline):
    sections = FIND_SECTIONS_REGEX.findall(outline)
    return [
        {
            "section_nr": si + 1,
            "title": FIND_TITLES_REGEX.search(section).group(1),
            "unit_titles": FIND_UNIT_TITLES_REGEX.findall(section),
        } for si, section in enumerate(sections)
    ]


df = pd.read_csv("outlines_full.csv", dtype=str)
for a, b in df.iterrows():
    sections = extract_sections(b["outline"])
    sections_list = '\n'.join(
        [f"{si + 1}.{ui + 1} {section['title']}: {unit_title}" for si, section in enumerate(sections) for
         ui, unit_title in enumerate(section["unit_titles"])])
    prompt = OUTLINE_FILTER_TEMPLATE.substitute({"COURSE_TITLE": b["course_title"], "SECTIONS_LIST": sections_list})
    df.loc[a, 'filter_outline_prompt'] = prompt
    df.loc[a, 'filter_outline_result'] = generate... # actually generate the filter results

In [None]:
df.to_csv("outlines_full_filtered.csv", index=False)