Computer Science Platform

(1)

B

ACHELOR

T

HESIS

Computer Science Platform

Author:

ALEXANDRUDan

Supervisor:

Ass. Pr. PhD. IFTENEAdrian

November 27, 2018

(2)

ii

“It seems to me that it’s through this machine that for the first time we’ll be able to have a one-to-one relationship between information source and information consumer. ”

Isaac Asimov

(3)

Alexandru Ioan Cuza University of Iasi

Abstract

Faculty of Computer Science Department of Computer Science

Bachelor

Computer Science Platform by ALEXANDRUDan

At the time of writing this thesis, at the core of a learning revolution stands Computer Science. Even so,formal education relies on antiquated platforms and methods, while informal/non-formal/professional education seeks to establish a definitive resource.

This thesis aims to provide middle ground for both ends, such that a integral and common approach be found in Computer Science education.

We accomplish this by providing amodular microservice-oriented architecture that combines features and integrations from multiple projects such as: Node.js, Re- act, Python, Nvidia Docker and Jupyter, among others.

Through this platform, we enable workflows such as: presenting reveal.js or jupyter RISEslides, edit pages using Markdown, create ephemeral docker con- tainers for labs, enable persistent docker containers that aredeep-learning ready.

(4)

(5)

I thank my family for supporting me and Ass. Pr. PhD. IFTENEAdrian for help- ing me accomplish this thesis.

(6)

(7)

To my father.

(8)

(9)

State of the art

For a novel solution, inspiration must be drawn from multidisciplinary platforms.

As such, we will consider grouping the platforms by functionality rather than intended audience.

1.1 Learning platforms

We would classify the platforms we will discuss into the following categories:

• memory-based:Memrise,Duolingo

• exercise-based / project-based: Codecademy,Dataquest

• (competitive) programming-based: Infoarena,Campion.edu.ro,HackerRank

• visualization-based: Visualgo,Codingame

• course-based:edX,Coursera,Udacity,Pluralsight

• mixed platform (multiple features): Khan Academy,DataCamp

1.1.1 Memory-based platforms

“The Memrise community uses images and science to make learning easy and fun. Learn a language. Learn anything. ”

— Memrise tagline

FIGURE1.1: Memrise Home Page One of the most popular methods of learning

is theflashcard method. A key exercise in memorizing various information, it works by memorizing a question and its answer using the oppo- site sides of a card.

Some platforms were launched for learning languages - you associate a word with a translation,with the gist that you learn a mnemonic to help you remember that information. This process might have been used for a long time (intuitively or learned), but applications such as Memriseor Duolingohelped solidify this process and make it more accessible and less stren- uous compared to simple flashcard learning.

(12)

4 Chapter 1. State of the art

FIGURE1.2: Memrise structures learning in 2 phases: seed and water (there used to be an intermediary phase, 24 hours after seeding).

Memrise is structured as a crowdsourced course site, most of the courses being made for teaching various languages. As such, you have various levels in a course (grouped by context), each level consisting of learning words and their respective translation.

Learning steps for Memrise:

(a) initial concept learning, called“seed”:

• associatingwordswithtranslations;

• associating theabove tuplewith amnemonic;

• reviewingwhile learning thecombination of the three above.

(b) consolidating concepts, called“water”:

• testing the known words once in a whilewith various methods;

• if theword is not known,review the combination(and/or pick a new mnemonic);

• test again the known wordsin thecurrent session,adding extra ques- tionsfor user associations / translations that were mistaken (works sim- ilarly to exponential backoff).

1.1.2 Exercise-based platforms

In the past 10 years, this method has become very popular with learning sites (eg:

Khan Academy, started 2006). The reasons for this are simple:

• practical- exercises actually teach the skills the course professes

• consolidation- to avoid the danger of memorizing solutions (rote memoriza-

tion, applying Stack Overflow solutions without understanding/reshaping/improving them), some platforms offer projects as a solution to this

(13)

FIGURE1.3: Codecademy example exercise page.

• controlled failure - the course creators might put bigger challenges or intended mistakes throughout the course in order to make students comfortable with temporary setbacks

• thorough repetition- persistent and consistent incremental steps outside the comfort zone yield mastery

Consolidation works through(spaced) repetition (different contexts, similar so- lutions)and throughapplied projects:

• Codecademyhas projects for various web technologies

• Dataquest offers Guided Projects - having some instructions to follow, but that only hint what to analyze,DataCampemploys a similar solution

• Dataquest also offers Projects (portfolio-oriented projects, similar to Udac- ity’sapproach, with the difference that Udacity offers only project upload, not the context where you code that project)

These platforms have the same structure for pages: lesson/exercise require- ments/hints,IDEandoutput.

The lesson component contains the concept taught with some further explanations, theexercise requirements(they can be single-step, multi-step, sequential multi-step) andhints(if the user chooses so).

The solution input used varies by the lesson type. Usually it’s an IDE (eg:

Javascript, Python), other times it can be a bash session (or REPL in a bash session). Another differentiator here would be if theinput is for one file (eg: python solution), or if it’s done formultiple files/folders(eg: git solution, web solution).

1.1.3 Competitive programming platforms

Most platforms under this category have alarge variety of problemsto solve, but arenot updatedin terms of languages you can use, IDE, visualization of problems.

Also explanations for algorithmsor original solutions are scarce(due in part to small population and fear of memorization of solutions, which could set an uneven playfield for contests) or rarely encouraged through solution sharing (after some form of locking that problem solution, i.e. not being able to edit it further).

(14)

FIGURE1.4: HackerRank example exercise page.

In this context,most platforms here are very similar in interface and function- ality, slightly different in problem sets, with most of themfocused on training stu- dents for ACM(or similar) competitions.

Though, here we can include HackerRank andCodingame, both with a very distinct feel and very particular features.

HackerRank

By content, HackerRank is a classical competitive programming platform. But with somesignificant differences:friendlier interface,profile page with exercise streak- ing(promotes persistent learning through gamification) similar to a Github profile, competitions held by users/HackerRank/sponsors (similar to how TopCoder or Codeforces operate),varied training content(from mathematics to regular expres- sions) with per-exercise ranking and you can also get “hackos” by solving problems (points to buy test cases and check where your algorithm failed).

HackerRank also offers the possibility to participate in Project Euler math challenges as an indefinite competition.

Codingame

Codingame is even more niche, byproviding users with visualizations(working under a verbose mode with more data or a strictly visual mode) for their problems.

Of course, they provide a lot of training problems and even recurrent few-player competitions (“Clash of Code”).

As far as solutions go, usuallyheuristics are preferredfor competitionsas the simpler solutions, which can yield surprisingly good results, but will not grant ad- mission into higher leagues ( top 500 contestants out of thousands). Thisallows for bigger participationfor contests and a more rewarding experience with most beginners.

(15)

FIGURE1.5: Codingame example exercise page.

1.1.4 Visualization platforms

There are few examples of such platforms due to this fundamental problem: good interactive visualizations take time to make and few people know the technologies to do so.

Amongs these few, Visualgo stands out by having visualizations for the most frequent data structures and algorithms.

Alternatives: Galles algorithms visualizations (Galles,2011) or some resources from theenjalot/algovis repo(Johnson,2017).

1.1.5 Course-based platforms

FIGURE1.6: Visualgo example exercise page.

These platforms focus on deliver- ing good courses: edX and Cours- era on almost exclusively academic courses,Udacityon project-based “nan- odegrees” (for certifications in various domains where a student might need a portfolio, eg: Android Development, Deep Learning) andPluralsightor similar platforms for professionals who want to upskill.

Usually, these kind of platforms have videos between 5 minutes and 30 minutes, sections in the course pages where you download the materials and a forum.

Project submission and peer grad- ing is done with edX, Coursera, Udacity

(16)

FIGURE1.7: Pluralsight course dashboard page.

(being also the ones that offer the ver-

ified certification and course timelines for syncing between course staff and students). These are commonly called MOOCs (Massive Open Online Course, as coined by Cormier,2008)

1.1.6 Mixed platforms

Mixed platforms combine traits from previous platforms.

Prime examples for this areKhan AcademyandDataCamp.

Khan Academyhas: courses, automatically-generated math exercises, code assignments, streaking and very detailed tracking of class assignments and skills.

DataCamp has: courses, code assignments, streaking and minimal tracking of class assignments, projects, podcasts, community-driven courses and tutorials.

FIGURE1.8: How lessons are structured on Khan Academy.

(17)

FIGURE1.9: Datacamp exercises for the Network Analysis course.

1.2 Common features

1.2.1 Lessons

Depending on the platform, lessons are usually in the form of:

• videos

• short quizzes

• introductory/exploratory exercises Videos

FIGURE1.10: Coursera lesson - Andrew Ng’s

"Intro to Machine Learning" course By length, videos can be classified as -

“bite-sized” (< 4min), introductory (<

10min, most videos),lecture(> 30min).

The other classification we would make:

taped courses and adapted courses.

Taped courses are most commonly found on MIT’s OpenCourseWare service. They are more often than not recordings of faculty courses, though sometimes Q&A sessions have been added and more explanations might have been given, so as to avoid confu- sions for online viewership (to combat the handicap of lacking the on-campus experience). Usually they are 30 minutes to60-70 minuteslong.

Some examples of taped courses: MIT 6.034 Artificial Intelligence, Fall 2010 (MIT OCW).

Adapted coursesare usually found on edX and Coursera. They are still some- what traditional in teaching methods and academic in content, but they have been

(18)

reduced to bite-sized lessons, and are organized less like sequential lectures, and more like clips that are ordered for each module, but modules can be watched out- of-order (though usually there are constraints like homework and courses are pub- lished only 1-2 weeks in advance). Usually they are10-20 minuteslong.

Some examples of adapted courses: Andrew Ng’s "Intro to Machine Learning"

(Coursera), Stanford’s "CS50" (edX).

Intro quizzes

Usually introductory quizzes work either asrefreshers(review previously studied concepts after some time) orintuition checks(make some introduction to a chapter and check if the student can anticipate the answers.

Introductory exercises

Split into 2 types: lesson slide (exercise with an introductory assignment or with some definitions) andexperimentation-free slide(some sample code provided, you can modify the code to see what it returns).

Usually both these types arepass-through exercises(you only need to read them, no need to do any assignment and can safely go to the next exercise).

1.2.2 Exercises

What actually start to constitute exercises (as far as progress in the course counts) usually appear as:

• single-step or multi-step instructions (which can be done in any order or are sequential)

• quiz - review of recently learned concepts

1.2.3 Projects

Of the various project definitions within the studied platforms, defining a project as aJupyter notebook appeared be the most appealing, due to the mixed nature of writing in it: you can combine markdown with various programming languages and visualizations.

FIGURE1.11: Datacamp Projects

(19)

— Deterding et al.,2011 Gamification can be accomplished with any of these tools:

• points / xp (almost all platforms I’ve mentioned)

• badges (Codecademy, Khan Academy)

• certification (almost all)

• streaking - days streaked, recently finished, listed skills on profile (and the date when they were obtained) (almost all)

• progress - percent (most) or boxes (Khan Academy, HackerRank)

• multiple career/skill/subject tasks (almost all, but for career only Pluralsight, Datacamp and Dataquest offer these)

FIGURE 1.12: Codecademy user profile: XP, streaking, badges, progress

1.2.5 Classes, Profile

To promote thesocial aspect of learning, learning platforms offer users the option to showcase your skills, projects, certifications on your profile. Another important feature is to show leaderboards in some form.

FIGURE 1.13: Khan Academy User Profile gamification - badges, streaks, points

(20)

1.3 Inspiration

Of the above SotA examples features, some elements could be integrated/adapted to the context of this work:

• side menus - edx/coursera/udemy/codingame;

• general interface - codingame/codepen/codecademy/datacamp;

• unique link and sharing schemes;

• dynamic notebooks - jupyter.

(21)

Chapter 2

Technologies used

While this list isn’t intended to be exhaustive, these are some of the critical technologies in this application:

• React

• Jupyter

• Node.js

• Python

• Docker

• AWS

2.1 React

FIGURE2.1: React logo

Description

React is a open-sourceMV*(model-view) library made by Face- book.

Advantages, trade-offs Advantages for using React are:

• very small overhead (as such, faster render and update)

• tree-based update

• reusable components

• Chrome DevTools support Trade-offs:

• react-router doesn’t have some authentication functionalities built- in, you are either given the choice to write them from some boil- erplate or use redux

• state updates should be reviewed carefully

• JSX syntax requires adjustment coming from html and templates

Usage

React has been used because of its very small overhead and very high extensibility. It ensures a clean approach to modularization, sustainable data flow (through the use of the Flux / Redux libraries and corresponding patterns), many possible integrations and an isolated scope for components (making a plugin-based platform easier).

Using an MVC framework such as Angular.js (with which the project was started some time ago and has since fully transitioned

(22)

14 Chapter 2. Technologies used

from) would be unwise, since version 1 has a good chance of bugs in some specific update scenarios, has been deprecated and would not receive support anymore, and version 2 and newer have been significantly changed, with even more scaffolding nec- essary and more markup (slower in certain scenarios) and as such overhead on multiple phases - development, deployment etc.

2.2 Jupyter

FIGURE 2.2:

Jupyter logo

Description

Jupyter is a web notebook that enables making documents that combine code, markdown, visualizations and makes prototyping and learning faster.

Advantages, trade-offs Advantages:

• being aweb notebook, you caneasily dockerizeit anddeploy it on an AWS server and connect to it remotely (compared to older desktop-based solutions, this factor already weighs heavily in the decision to use it)

• supports multiple “kernels”(differentprogramming languages need different middleware for supporting them in Jupyter, this middleware is Jupyter kernel)

• native plugin support - eg: RISE (for reveal.js presentations)

• support for parrallel computing (directly through ipyparallel)

• easy to share - it saves graphs made inside it, you can export it to latex, html, pdf or render it online on dedicated services and even on Github

Trade-offs:

• variable state is shared, easy to lose track of it

• not versioning-friendly - ipynb (the notebook format) is a json and all graphs rendered in it are saved as base64 (there is a though a solution such as nbdime - Jupyter,2015)

Usage Motivation

2.3 Node.js

Description

Node.js is a Javascript runtime built on Chrome V8 Javascript En- gine.

(23)

FIGURE 2.3:

Node.js logo

• a lot of scaffolding available, fast prototyping

• almost all web-related technologies are easily integratable with Node.js (through package managers such as npm or yarn) or have a compatible bridge/client for Node.js

• async by design

• allows lean microservice architectures (through module importing and usual project structure), encourages flat hierarchies Trade-offs:

• async js syntax might not be intuitive for anyone starting learning Node.js

• npm installs are sometimes slow

Usage

In this project, Node has been used two-fold: for the react client and the node server.

The react client was built usingcreate-react-app, providing a bootstrapped React app without the unnecessary configuration complications (such as Webpack).

Motivation

Node.js is one of the most flexible backend frameworks, being the fastest prototyping backend for React.

2.4 Python

FIGURE 2.4:

Python logo

Description

Python is an interpreted high-level programming language that is designed for scripting and fast application development, for diverse use cases (from scripting to web development to deep learning).

Advantages, trade-offs

Some of the advantages of Python:

• simple syntax

• reference implementation is CPython (written in C) and very good integration with C/C++ allows Python to use for a lot of its libraries low-level implementations, usually C and Fortran (eg: numpy is based on BLAS/Lapack) => end result is that python for most workflows compromises little on speed for easy usage and opti- mizing

• libraries for various workflows: from AWS scripting, docker, to data science

• de-facto standard for certain use cases: data science (alongside R), deep neural networks, big data (alongisde Scala)

(24)

• integration with various backends: Spark (PySpark),

• mixed learning through Jupyter notebooks (most support for Python, but various additional kernels are supported)

Trade-offs:

• GIL (Global Interpreter Lock) - parallelization by multithreading is difficult, given the lack of thread-safety

• memory management issues (garbage collector not as refined as in Java)

• pip dependencies for python should be carefully handled - one version of a package can be kept simultaneously (if you keep native pip and don’t use a 3rd party solution)

Usage

In this platform, Python has been used as much as possible for extensible, easy to maintain microservices.

Motivation

Python is a great prototyping language, it has libraries and frameworks for extremely diverse purposes, and as such can cover a lot of scenarios we’ve used it for.

In general, Python has been used as the main logic for microservices in this application.

2.5 Docker

FIGURE 2.5:

Docker logo

Description

Docker is a container platform written in Go, that allows for extensible, maintainable, distributable containers.

• easy deployment

• lightweight

• ephemeral use and easily disposable

• isolation of OS-level dependencies (when using OS docker images, eg: ubuntu:16.04) => allows a minimally configured host to run workflows as complex as running deep neural networks on nvidia-docker

• isolation of package-level dependencies (when using eg: guest numpy vs host numpy) => allows for complete installs

• fast iteration - subsequent image builds from a Dockerfile are being cached and as such are much faster

Trade-offs:

• Docker isolates it’s containers ports from the host scope, all ports have to be first exposed from the guest, and then forwarded to

(25)

of an image will not be significantly slower

• when lacking a tagging policy, using the latest tag with every new build will untag previous latest tags => will end up having a lot of images with no tags, the name and size would be the only in- dicators (but given the name is unique, it will be most probably deleted from previous images)

Usage

Docker has been used extensively in this thesis:

• Every tool, server or client is wrapped in a docker container

• Most of these container images are pushed to a on-premise deployment of docker registry (which is itself deployed in a docker container)

• The platform uses Dockers for ephemeral compute

2.6 AWS

FIGURE2.6: AWS logo

Description

Amazon Web Services (AWS) is one of the most popular cloud providers (and the oldest one).

• small starting costs, no on-premise servers costs and maintenance

• no upfront charge, fundamental cloud tenets such as “pay as you go” are respected

• one of the smallest prices for compute especially with spot instances

• various type of instances for any use (CPU-intensive, GPU-intensive, memory-intensive or specialized, such as f1 instance for FPGAs)

• few quota locks that are disabled with use / time Trade-offs:

• vendor lock-in => mitigated by using non-AWS specific technologies that are easily integratable in AWS, alongside a clear and simple docker architecture

• high costs for GPU instances, spot instances can spike during high demand

• any SaaS/PaaS paid solution provided by AWS / AWS Market- place is significantly more expensive than IaaS

• dependent on AMI images, without the possibility of other types of virtualization

(26)

Usage

AWS has been used for deploying the platform, ephemeral compute instances and CI (Continuous Integration).

Motivation

AWS at this point has one of the most mature services, has cheaper TCO especially with spot instances, is well documented and APIs are stable (they aren’t changed significantly with difficult maintenance for older services).

To avoid vendor lock-in, a lot of tools have been built to mimic the AWS API for convenience (eg: Minio), or to be able to be deployed on multi-cloud configurations (eg: Kubernetes, Minio).

(27)

Chapter 3

Platform architecture

Theplatformhas beenbuilt with the following constraintsin mind:

• as few configurations on the host as possible => eg: minimal drivers (eg:

Nvidia drivers for nvidia-docker), docker config, monitoring ...)

• deployment should be isolated(as few external dependencies as possible after initial setup, package managers and docker registry should be self-sufficient)

=>local docker registry,caching of package managers

3.1 Application architecture

Standard workflow is: user accesses theReact client(deployed on a Node server), which calls theNode server for auth or minimal services, which in turn will call the Python server(which shoulddo the most heavy-liftingto avoid coupling above).

The Python server will createephemeral Dockerswhen needed.

FIGURE3.1: Application architecture.

(28)

20 Chapter 3. Platform architecture

3.2 On-premise deployment architecture

Per the constraints above,every module in the platform is dockerized(i.e. module is built and run inside a container), a local docker registry for that VM/machine providing the needed images after the initial setup.

It isrecommendedtorun the compute-heavy or ephemeral docker containers on a distinct VM/machine, such that the main application and the correlated dock- ersnot be strained / blocked.

Ideally, all the containers on that machine should be orchestrated by k8s (Kuber- netes) or Docker Swarm. Solutions like Portainer can be used to allow students or admins easier deployment of Dockers on the platform.

FIGURE3.2: Application docker architecture.

3.3 Further improvements

Depending on theload on the app, there are 2 scaling options: scale up (bigger servers) orscale out (more servers). In that context, various degrees of coupling might be accepted.

The fully decoupled option- the VM structure should be as follows:

• React

• Node

• database VMs- mongo / minio

• Python

• ephemeral Docker controller

• (multiple) spot instances for ephemeral Dockers The half coupled option, as in figure3.2.

(29)

Chapter 4

Development

The initial development for this platform sparked from us observing the existence of many tools that were disjoint in their use and platforms picking just a few of these features and focusing mostly on content.

FIGURE4.1: One of the early demos demonstrating the use of various integrations

4.1 Tools developed

During the development of this platform, most features have been first implemented separately as modules (written in pure Javascript), and had been afterwards integrated in the React application.

As such, this application favored the use of microservices and modularization from the start.

4.1.1 Markdown Editor tool

For course pages, editing documentation (eg: READMEs) needed a Markdown Edi- tor. By combining the Ace Editor and Showdown parser we got the below result.

4.1.2 Reveal Importer tool

For presentations made on slides.com, importing them and especially versioning them is a hassle (injected styling and scripts in a page, themes used are not public, even though reveal.js is open-source etc.).

(30)

22 Chapter 4. Development

FIGURE4.2: Markdown Editor tool

We’ve made an importer tool which parses the resulting files from slides.com and you can compare the original file to the simplified file and check for visual differences (both loaded in iframes, events in the tool window are captured and sent to the children iframes)

FIGURE4.3: Reveal Importer tool

4.1.3 Modularization

Every module of the application (the clients and servers, databases and tooling dockers) have been dockerized, and dockers are also available for separate modules. As such, the solution consists of having an online platform, an easy on-premise deployment with dockers and packaging of separate tools for offline work.

(31)

FIGURE4.4: Some of the Docker containers created or pulled

4.2 Interface

FIGURE4.5: The homepage for the application

(32)

FIGURE4.6: Route listing all the components available as of the writing of this thesis.

FIGURE4.7: Using Katacoda for on-demand labs (that can be used both live in faculty or at home.

(33)

container. So anything related to HPC (High-Performance Computing) that would be enabled by CUDA (amongst other technologies) or deep learning wouldn’t be possible.

Through the use of CUDA and CuDNN-enabled Ubuntu images, we can accomplish just that, with a solution that:

• gracefully avoid vendor lock-in by using open-source technologies - Docker, Jupyter and configurations for it and various pip packages.

• has most python dependencies needed for linear algebra (numpy, scipy), visualization (matplotlib, seaborn), machine learning (scikit-learn), deep learning (keras, tensorflow, pytorch)

• is cheap to deploy - built docker image has 3.6GB, approximately one of a standard VM (virtual machine).

FIGURE 4.8: Leveraging nvidia-docker and Jupyter notebooks for deep learning labs

As stated before, Jupyter is capable of integrating various workflows and can be used for various learning activities: presentations, visualizations, learning documents, analysis.

(34)

FIGURE4.9: A Jupyter notebook running in an nvidia docker enabled container, displaying a RISE presentation with Markdown, Python

and matplotlib visualization

4.3 Use cases

Below we have described some common use cases for the platform and what workflows are enabled by it (including future workflows).

4.3.1 Course pages

Users can edit and display markdown pages on cs-platform.

Multiple alternatives are given:

• upload manual or statically generated html site

• edit markdown realtime

4.3.2 Course review and memorizing aids

Due to the AutoCourse Builder feature, course pages will feature learning materials from both teachers and students, summaries, quick quizzes and mindmaps.

4.3.3 Lessons / Presentations

Users can build lessons or presentations with multiple tools:

• create and edit on slides.com and import to cs-platform

• upload or edit on cs-platform (from slides.com or pure reveal.js)

• edit and present from a minimal docker container with Jupyter and RISE

4.3.4 Labs

Labs can be created through either:

• pick docker container that you want to run (new one or already-created one)

• VNC or openssh to desired container (be mindful that it must be exposed to the internet)

(35)

• the teacher using and presenting from a demo container

• local and then deployed through webhooks to katacoda, which will then be embbeded in the platform

• local and then deployed through webhooks directly to cs-platform

• edit on cs-platform and serve lab from it

4.3.6 Project submission

Projects can be submitted by students on select pages. The workflow is as follows:

• upload on dropzone page (project will be saved both as zip and unzipped form to minio)

• the platform will display the project for both student and teacher

• the platform will check the project for any irregularities

4.3.7 Project statistics

Concurrently with the above workflow, on upload of the project the git microservices will analyse the git metadata of the project.

4.3.8 Distance Learning

Distance learning is enabled by the following platform features:

• AutoCourse Builder - additional resources for any subject by automatic classification and ranking

• reveal.js presentations - access to both teacher and student presentations for any subject (provided the crowdsourcing condition is encouraged at that specific course)

• docker playgrounds

• docker labs

• VNC access to both AWS instances or on-campus computers or local machine (if away from home)

(36)

4.4 Future Improvements

I’ve listed below some started features that can significantly improve this project.

4.4.1 Interface improvements

• use of headless React CMS (API based CMSs are easier to integrate with a pre- existing app) or build our own

• use of service workers for some cs-platform features (such as markdown editor)

4.4.2 Infrastructure improvements

• for the nvidia-docker - some packages take either a long time during install or have heavy downloads (eg: gensim, nltk corpora, pytorch downloading wheels for statically-linked CUDA/CuDNN etc.) - a complete image for this container should ideally be built and saved in the docker registry and invest- ment should be made in package caching (pip, apt-get) and image caching (during build, succesive container images are cached)

• nvidia-docker should also package managers like Anaconda (especially mini- conda for custom builds) - eg: bokeh datashader is ideally installed through conda

• RBAC should be reflected accross the layers (React, Node, Python)

• multitenant scenario should be considered (here Postgres seems to be a better option in principle than Mongo)

• React security (for plugin components) and VNC security should be considered

4.4.3 Case study: AutoCourse Builder

Through this feature,senior students canupload a zip with their resources from faculty (notes, grades, courses, labs, books, various implementations). Based on this, the platform will do the following:

• automaticallybuild the course pagefrom thecross-referenced resources(multiple students uploading archives that may be from different years, have different content)

• perform document analysison the courses - a good review of available techniques for information extraction(“NLP Techniques for Extracting Informa- tion”)

A correlated problem would be that ofmodelling student knowledge and recall.

A classic solution is the SuperMemo algorithm made by Piotr Wozniak (Super- Memo,2016). It builds on theLeitner system, assuming that therecall curveof the studentdecreases logarithmically and spaced repetition (with a frequency estab- lished by a metric, multiple discussed - SuperMemo, 2015) should occur to keep that knowledge.

Anewer solutionwould beDeep Knowledge Tracing(abbrev. DKT, Piech et al., 2015), defined as modelling “the knowledge of a student as they interact with coursework” (source: ibidem). This RNN (i.e. Recurrent Neural Network) architecture was trained on aKhan Academy sample, and canpredict student responses andmodel conditional influence between exercises.

(37)

(Grover,2017) - once we build the resources from the previous feature, how can we know what resources to recommend to users ?

One solution would be to take data from 2 sources: how much they’ve studied particular resources displayed on the platform (through mouse events and the Javascript Page Visiblity API - Mozilla,2018, you can track if your user is focused on your page; demo: Marzullo,2014), and based on bookmarks (given the proper motivation of syncing with the platform). We can thus gather enough data in order to make a recommendation based on user interests.

4.4.5 Case study: Traefik

A newer alternative to nginx is Traefik, a Go tool built especially for Docker containers and orchestration solutions (k8s, swarm, rancher, mesos) and can be used mainly as an reverse proxy.

FIGURE4.10: Using Traefik as a reverse proxy for the microservices in the platform

4.4.6 Case study: Monaco Editor

Instead of using ACE Editor for everything, for specific usecases Monaco Editor can be used (eg: diffing versions of file). This tool is produced by Microsoft and has been open-sourced in 2018, and is the editor used for Microsoft Studio Code.

4.4.7 Case study: Git/Github analysis

Project submission is an important feature, and thorough analysis of the git metadata should be important to both evaluators and students. I’ve made a demo for the Github API, though libgit2 should also be considered (library used by Github in the backend for this, among other uses).

(38)

FIGURE4.11: Using Monaco Editor instead of ACE Editor for specific usecases

FIGURE 4.12: Github metadata analysis Rate limiting might be an issue, and as such GraphQL should

be considered for more meaningful queries

(39)

Chapter 5

Conclusions

The work we have done with cs-platform tries to enable workflows and teaching methods that have been in past either:

• heterogeneous- across multiple platforms / solutions

• impossible- especially the case provided there was no Docker, Jupyter, Nvidia- Docker

• expensive- using the AWS Deep Learning AMI requires 80+ GB of EBS (file storage) compared to our solution which requires 4GB and is highly config- urable

We hope that through these reusable components and the composition and orchestration of them, we can create a full-fledged solution that would in the following years be used by multiple institutions of higher learning.

(40)

(41)

Bibliography

[1] Dave Cormier. The CCK08 MOOC – Connectivism course, 1/4 way. (accessed June 28, 2018). 2008. URL: http : / / davecormier . com / edblog / 2008 / 10/02/the-cck08-mooc-connectivism-course-14-way/.

[2] Sebastian Deterding et al. “From Game Design Elements to Gamefulness: Defin- ing "Gamification"”. In:Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments. MindTrek ’11. Tampere, Fin- land: ACM, 2011, pp. 9–15.ISBN: 978-1-4503-0816-8.DOI:10.1145/2181037.

2181040.URL:http://doi.acm.org/10.1145/2181037.2181040. [3] David Galles.Data Structure Visualization. (accessed June 28, 2018). 2011.URL:

https://www.cs.usfca.edu/~galles/visualization/Algorithms.

html.

[4] Prince Grover. “Various Implementations of Collaborative Filtering”. In: (2017).

(accessed June 28, 2018). URL: https : / / towardsdatascience . com /

various-implementations-of-collaborative-filtering-100385c6dfe0.

[5] Ian Johnson. Data Structure Visualization. (accessed June 28, 2018). 2017. URL: https://github.com/enjalot/algovis.

[6] Jupyter. jupyter/nbdime - Tools for diffing and merging of Jupyter notebooks. (accessed June 28, 2018). 2015.URL:https://github.com/jupyter/nbdime. [7] Jonathan Marzullo. “Using HTML5 Visibility API with GSAP...” In: (2014). (accessed June 28, 2018).URL:https://codepen.io/jonathan/pen/sxgJl.

[8] Mozilla. “Page Visibility API - Web APIs”. In: (2018). (accessed June 28, 2018).

URL: https : / / developer . mozilla . org / en - US / docs / Web / API / Page_Visibility_API.

[9] Paul Nelson. “NLP Techniques for Extracting Information”. In: (). (accessed June 28, 2018). URL: https : / / www . searchtechnologies . com / blog / natural-language-processing-techniques.

[10] Chris Piech et al. “Deep Knowledge Tracing”. In:CoRRabs/1506.05908 (2015).

arXiv:1506.05908.URL:http://arxiv.org/abs/1506.05908.

[11] SuperMemo. “SuperMemo Algorithm”. In: (2016). (accessed June 28, 2018).

URL:http://help.supermemo.org/wiki/SuperMemo_Algorithm.

[12] SuperMemo. “SuperMemo Spaced Repetition Metric”. In: (2015). (accessed June 28, 2018). URL: http : / / supermemopedia . com / wiki / Spaced _ repetition_algorithm_metric.

Computer Science Platform

B

T

Computer Science Platform

Abstract

To my father.

Contents

Chapter 1

State of the art

1.1 Learning platforms

1.2 Common features

1.3 Inspiration

Chapter 2

Technologies used

2.1 React

2.2 Jupyter

2.3 Node.js

2.4 Python

2.5 Docker

2.6 AWS

Chapter 3

Platform architecture

3.1 Application architecture

3.2 On-premise deployment architecture

3.3 Further improvements

Chapter 4

Development

4.1 Tools developed

4.2 Interface

4.3 Use cases

4.4 Future Improvements

Chapter 5

Conclusions

Bibliography