opensource.google.com

Menu
Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Security Crawl Maze: An Open Source Tool to Test Web Security Crawlers

Friday, June 21, 2019

Scanning modern web applications for security vulnerabilities can be a difficult task, especially if they are built with Javascript frameworks, which is why crawlers have to use a multi-stage crawling approach to discover all the resources on modern websites.

Living in the times of dynamically changing specifications and the constant appearance of new frameworks, we often have to adjust our crawlers so that they are able to discover new ways in which developers can link resources from their applications. The issue we face in such situations is measuring if changes to crawling logic improve the effectiveness. While working on replacing a crawler for a web security scanner that has been in use for a number of years, we found we needed a universal test bed, both to test our current capabilities and to discover cases that are currently missed. Inspired by Firing Range, today we’re announcing the open-source release of Security Crawl Maze – a universal test bed for web security crawlers.

Security Crawl Maze is a simple Python application built with the Flask framework that contains a wide variety of cases for ways in which a web based application can link other resources on the Web. We also provide a Dockerfile which allows you to build a docker image and deploy it to an environment of your choice. While the initial release is covering the most important cases for HTTP crawling, it’s a subset of what we want to achieve in the near future. You’ll soon be able to test whether your crawler is able to discover known files (robots.txt, sitemap.xml, etc…) or crawl modern single page applications written with the most popular JS frameworks (Angular, Polymer, etc.).

Security crawlers are mostly interested in code coverage, not in content coverage, which means the deduplication logic has to be different. This is why we plan to add cases which allow for testing if your crawler deduplicates URLs correctly (e.g. blog posts, e-commerce). If you believe there is something else, feel free to add a test case for it – it’s super simple! Code is available on GitHub and through a public deployed version.

We hope that others will find it helpful in evaluating the capabilities of their crawlers, and we certainly welcome any contributions and feedback from the broader security research community.

By Maciej Trzos, Information Security Engineer

How we brought the latest version of Python to App Engine and Cloud Functions

Monday, August 13, 2018

At Cloud Next 2018, we added Python 3.7 support to Cloud Functions and now we’ve announced Python 3.7 support for the App Engine standard environment. These new runtimes allow you to write Python functions and apps using the latest version of Python and the rich ecosystem of packages available on Python Packaging Index (PyPI).

This new runtime marks a significant update to App Engine and was enabled by new open source software that we recently released: gVisor and FTL.

Python, straight from the source

Running Python 3.7 on App Engine and Cloud Functions required us to fundamentally rethink our infrastructure. Traditionally, meeting Google Cloud’s security requirements meant that we had to run a modified version of the Python interpreter. However, using a modified interpreter constrained some language features and only allowed us to support a limited set of whitelisted Python libraries.

Thanks to gVisor, a container sandbox that provides improved security and process isolation, we can now run the unmodified Python 3.7.0 interpreter. We’ve done extensive testing to make sure Python 3.7 is compatible with gVisor. As part of our compatibility testing, we run Python’s full suite of language tests, and tests for Python packages that are popular on PyPI. We’re committed to ensuring that everything you’ve come to know and love about Python is supported on our platform.

Seamless deployments

Most importantly, this change in our infrastructure makes it easier to take advantage of Python’s vast ecosystem. As a developer, you just add project dependencies to a requirements.txt file and deploy.

During deployment, FTL, a tool for building containers, fetches dependencies listed in your requirements.txt file and installs them alongside your app or function. FTL also includes a short-lived dependency cache, which speeds up repeated deployments if no changes are detected in your requirements.txt file. This is particularly useful if you find just need to re-deploy because you found a typo.

Keeping up with the Pythonistas

In making these changes, we also decided to expand the list of system packages that are included with each runtime’s Ubuntu 18.04 distribution. We think that will make life just a little bit easier for developers working with the latest release of Python.

Looking forward, we’re excited about how these changes will allow us to keep up with the Python community’s progress as they release new versions and libraries. Please let us know what you think and if you run into any challenges.

You can learn more about how to get started with it on App Engine and Cloud Functions in our documentation. We can’t wait to see what you build with Python 3.7.

By Stewart Reichling, Product Manager

Tangent: Source-to-Source Debuggable Derivatives

Wednesday, November 8, 2017

Crossposted on the Google Research Blog

Tangent is a new, free, and open source Python library for automatic differentiation. In contrast to existing machine learning libraries, Tangent is a source-to-source system, consuming a Python function f and emitting a new Python function that computes the gradient of f. This allows much better user visibility into gradient computations, as well as easy user-level editing and debugging of gradients. Tangent comes with many more features for debugging and designing machine learning models.
This post gives an overview of the Tangent API. It covers how to use Tangent to generate gradient code in Python that is easy to interpret, debug and modify.

Neural networks (NNs) have led to great advances in machine learning models for images, video, audio, and text. The fundamental abstraction that lets us train NNs to perform well at these tasks is a 30-year-old idea called reverse-mode automatic differentiation (also known as backpropagation), which comprises two passes through the NN. First, we run a “forward pass” to calculate the output value of each node. Then we run a “backward pass” to calculate a series of derivatives to determine how to update the weights to increase the model’s accuracy.

Training NNs, and doing research on novel architectures, requires us to compute these derivatives correctly, efficiently, and easily. We also need to be able to debug these derivatives when our model isn’t training well, or when we’re trying to build something new that we do not yet understand. Automatic differentiation, or just “autodiff,” is a technique to calculate the derivatives of computer programs that denote some mathematical function, and nearly every machine learning library implements it.

Existing libraries implement automatic differentiation by tracing a program’s execution (at runtime, like TF Eager, PyTorch and Autograd) or by building a dynamic data-flow graph and then differentiating the graph (ahead-of-time, like TensorFlow). In contrast, Tangent performs ahead-of-time autodiff on the Python source code itself, and produces Python source code as its output.
As a result, you can finally read your automatic derivative code just like the rest of your program. Tangent is useful to researchers and students who not only want to write their models in Python, but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.

You can easily inspect and debug your models written in Tangent, without special tools or indirection. Tangent works on a large and growing subset of Python, provides extra autodiff features other Python ML libraries don’t have, is high-performance, and is compatible with TensorFlow and NumPy.

Automatic differentiation of Python code

How do we automatically generate derivatives of plain Python code? Math functions like tf.exp or tf.log have derivatives, which we can compose to build the backward pass. Similarly, pieces of syntax, such as  subroutines, conditionals, and loops, also have backward-pass versions. Tangent contains recipes for generating derivative code for each piece of Python syntax, along with many NumPy and TensorFlow function calls.

Tangent has a one-function API:
import tangent df = tangent.grad(f) 
Here’s an animated graphic of what happens when we call tangent.grad on a Python function:
If you want to print out your derivatives, you can run
import tangent df = tangent.grad(f, verbose=1) 
Under the hood, tangent.grad first grabs the source code of the Python function you pass it. Tangent has a large library of recipes for the derivatives of Python syntax, as well as TensorFlow Eager functions. The function tangent.grad then walks your code in reverse order, looks up the matching backward-pass recipe, and adds it to the end of the derivative function. This reverse-order processing gives the technique its name: reverse-mode automatic differentiation.

The function df above only works for scalar (non-array) inputs. Tangent also supports
Although we started with TensorFlow Eager support, Tangent isn’t tied to one numeric library or another—we would gladly welcome pull requests adding PyTorch or MXNet derivative recipes.

Next Steps

Tangent is open source now at github.com/google/tangent. Go check it out for download and installation instructions. Tangent is still an experiment, so expect some bugs. If you report them to us on GitHub, we will do our best to fix them quickly.

We are working to add support in Tangent for more aspects of the Python language (e.g., closures, inline function definitions, classes, more NumPy and TensorFlow functions). We also hope to add more advanced automatic differentiation and compiler functionality in the future, such as automatic trade-off between memory and compute (Griewank and Walther 2000; Gruslys et al., 2016), more aggressive optimizations, and lambda lifting.

We intend to develop Tangent together as a community. We welcome pull requests with fixes and features. Happy deriving!

By Alex Wiltschko, Research Scientist, Google Brain Team

Acknowledgments

Bart van Merriënboer contributed immensely to all aspects of Tangent during his internship, and Dan Moldovan led TF Eager integration, infrastructure and benchmarking. Also, thanks to the Google Brain team for their support of this post and special thanks to Sanders Kleinfeld and Aleks Haecky for their valuable contribution for the technical aspects of the post.

Introducing Python Fire, a library for automatically generating command line interfaces

Thursday, March 2, 2017

Today we are pleased to announce the open-sourcing of Python Fire. Python Fire generates command line interfaces (CLIs) from any Python code. Simply call the Fire function in any Python program to automatically turn that program into a CLI. The library is available from pypi via `pip install fire`, and the source is available on GitHub.

Python Fire will automatically turn your code into a CLI without you needing to do any additional work. You don't have to define arguments, set up help information, or write a main function that defines how your code is run. Instead, you simply call the `Fire` function from your main module, and Python Fire takes care of the rest. It uses inspection to turn whatever Python object you give it -- whether it's a class, an object, a dictionary, a function, or even a whole module -- into a command line interface, complete with tab completion and documentation, and the CLI will stay up-to-date even as the code changes.

To illustrate this, let's look at a simple example.

#!/usr/bin/env python import fire class Example(object): def hello(self, name='world'): """Says hello to the specified name.""" return 'Hello {name}!'.format(name=name) def main(): fire.Fire(Example) if __name__ == '__main__': main() 

When the Fire function is run, our command will be executed. Just by calling Fire, we can now use the Example class as if it were a command line utility.

$ ./example.py hello Hello world! $ ./example.py hello David Hello David! $ ./example.py hello --name=Google Hello Google! 

Of course, you can continue to use this module like an ordinary Python library, enabling you to use the exact same code both from Bash and Python. If you're writing a Python library, then you no longer need to update your main method or client when experimenting with it; instead you can simply run the piece of your library that you're experimenting with from the command line. Even as the library changes, the command line tool stays up to date.

At Google, engineers use Python Fire to generate command line tools from Python libraries. We have an image manipulation tool built by using Fire with the Python Imaging Library, PIL. In Google Brain, we use an experiment management tool built with Fire, allowing us to manage experiments equally well from Python or from Bash.

Every Fire CLI comes with an interactive mode. Run the CLI with the `--interactive` flag to launch an IPythonREPL with the result of your command, as well as other useful variables already defined and ready to use. Be sure to check out Python Fire's documentation for more on this and the other useful features Fire provides.

Between Python Fire's simplicity, generality, and power, we hope you find it a useful library for your own projects.

By David Bieber, Software Engineer on Google Brain

Grumpy: Go running Python!

Wednesday, January 4, 2017

Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTube’s APIs is primarily written in Python, and it serves millions of requests per second! YouTube’s front-end runs on CPython 2.7, so we’ve put a ton of work into improving the runtime and adapting our application to work optimally within it. These efforts have borne a lot of fruit over the years, but we always run up against the same issue: it's very difficult to make concurrent workloads perform well on CPython.

To solve this problem, we investigated a number of other Python runtimes. Each had trade-offs and none solved the concurrency problem without introducing other issues.

So we asked ourselves a crazy question: What if we were to implement an alternative runtime optimized for real-time serving? Once we started going down the rabbit hole, Go seemed like an obvious choice of platform since its operational characteristics align well with our use case (e.g. lightweight threads). We wanted first class language interoperability and Go’s powerful runtime type reflection system made this straightforward. Python in Go felt very natural, and so Grumpy was born.

Grumpy is an experimental Python runtime for Go. It translates Python code into Go programs, and those transpiled programs run seamlessly within the Go runtime. We needed to support a large existing Python codebase, so it was important to have a high degree of compatibility with CPython (quirks and all). The goal is for Grumpy to be a drop-in replacement runtime for any pure-Python project.

Two design choices we made had big consequences. First, we decided to forgo support for C extension modules. This means that Grumpy cannot leverage the wealth of existing Python C extensions but it gave us a lot of flexibility to design an API and object representation that scales for parallel workloads. In particular, Grumpy has no global interpreter lock, and it leverages Go’s garbage collection for object lifetime management instead of counting references. We think Grumpy has the potential to scale more gracefully than CPython for many real world workloads. Results from Grumpy’s synthetic Fibonacci benchmark demonstrate some of this potential:



Second, Grumpy is not an interpreter. Grumpy programs are compiled and linked just like any other Go program. The downside is less development and deployment flexibility, but it offers several advantages. For one, it creates optimization opportunities at compile time via static program analysis. But the biggest advantage is that interoperability with Go code becomes very powerful and straightforward: Grumpy programs can import Go packages just like Python modules! For example, the Python snippet below uses Go’s standard net/http package to start a simple server:

from __go__.net.http import ListenAndServe, RedirectHandler handler = RedirectHandler('http://github.com/google/grumpy', 303) ListenAndServe('127.0.0.1:8080', handler)

We’re excited about the prospects for Grumpy. Although it’s still alpha software, most of the language constructs and many core built-in types work like you’d expect. There are still holes to fill — many built-in types are missing methods and attributes, built-in functions are absent and the standard library is virtually empty. If you find things that you wish were working, file an issue so we know what to prioritize. Or better yet, submit a pull request.

Stay Grumpy!

By Dylan Trotter, YouTube Engineering

Open source down under: Linux.conf.au 2017

Wednesday, December 28, 2016

It’s a new year and open source enthusiasts from around the globe are preparing to gather at the edge of the world for Linux.conf.au 2017. Among those preparing are Googlers, including some of us from the Open Source Programs Office.

This year Linux.conf.au is returning to Hobart, the riverside capital of Tasmania, home of Australia’s famous Tasmanian devils, running five days between January 16 and 20.
Circle_DevilTuz.png
Tuz, a Tasmanian devil sporting a penguin beak, is the Linux.conf.au mascot.
(Artwork by Tania Walker licensed under CC BY-SA.)
The conference, which began in 1999 and is community organized, is well equipped to explore the theme, "the Future of Open Source," which is reflected in the program schedule and miniconfs.

You’ll find Googlers speaking (listed below) as well as participating in the hallway track. Don’t miss our Birds of a Feather session if you’re a student, educator, project maintainer, or otherwise interested in talking about outreach and student programs like Google Summer of Code and Google Code-in.

Monday, January 16th
12:20pm The Sound of Silencing by Julien Goodwin
1:20pm   An Open Programming Environment Inspired by Programming Games by Josh Deprez

Tuesday, January 17th
All day    Community Leadership Summit X at LCA

Wednesday, January 18th
2:15pm   Community Building Beyond the Black Stump by Josh Simmons

Thursday, January 19th
4:35pm   Using Python for creating hardware to record FOSS conferences! by Tim Ansell

Friday, January 20th
1:20pm   Linux meets Kubernetes by Vishnu Kannan

Not able to make it to the conference? Keynotes and sessions will be livestreamed, and you can always find the session recordings online after the event.

We’ll see you there!

By Josh Simmons, Open Source Programs Office

Budou: Automatic Japanese line breaking tool

Friday, October 21, 2016

Today we are pleased to introduce Budou, an automatic line breaking tool for Japanese. What is a line breaking tool and why is it necessary? English uses spacing and hyphenation as cues to allow for beautiful, aka more legible, line breaks. Japanese, which has none of these, is notoriously more difficult. Breaks occur randomly, usually in the middle of a word.

This is a long standing issue in Japanese typography on the web, and results in degradation of readability. We can specify the place which line breaks can occur with CSS coding, but this is a non-trivial manual process which requires Japanese vocabulary and knowledge of grammar.


Budou automatically translates Japanese sentences into organized HTML code with meaningful chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Cloud Natural Language API to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing PoS (part-of-speech) tagging and syntactic information. Budou outputs HTML code by wrapping the chunks in a SPAN tag. By specifying their display property as inline-block in CSS, semantic units will no longer be split at the end of a line.

Budou is a simple Python script that runs each sentence through the Cloud Natural Language API. It can easily be extended as a custom filter for template engines, or as a task for runners such as Grunt and Gulp. The latest version also caches the response so no duplicate requests are sent. If you are using Budou for a static website, you can process your HTML code before deployment.

Budou is aimed to be used in relatively short sentences such as titles and headings. Screen readers may read a sentence by splitting the chunks wrapped by SPAN tag or split by WBR tag, so it is discouraged to use Budou for body paragraphs.

As of October 2016, the Cloud Natural Language API supports English, Spanish, and Japanese, and Budou currently only supports Japanese. Support for other Asian languages with line break issues, such as Chinese and Thai, will be added as the API adds support.

Any comments and suggestions are welcome. You can find us on GitHub.

By Shuhei Iitsuka, UX Engineer

Jsonnet: a more elegant language for composing JSON

Monday, April 20, 2015

A few months ago, we quietly released Jsonnet: a simple yet rich configuration language (i.e., a programming language for specifying data). Many systems can be configured with JSON, but writing it by hand is troublesome. Jsonnet is packed with useful data-specification features that expand into JSON for other systems to act upon. Below is a trivial example of such expansion:

// Jsonnet Example
{
   person1: {
       name: "Alice",
       welcome: "Hello " + self.name + "!",
   },
   person2: self.person1 { name: "Bob" },
}
{
  "person1": {
     "name": "Alice",
     "welcome": "Hello Alice!"
  },
  "person2": {
     "name": "Bob",
     "welcome": "Hello Bob!"
  }
}
Jsonnet doesn’t just generate JSON: Jsonnet is also an extension of JSON. By adding new constructs between the gaps of existing JSON syntax, Jsonnet adds useful features without breaking backwards compatibility. Any valid JSON is also a valid Jsonnet program that simply emits that JSON unchanged, and existing systems that consume JSON (or its cousin YAML) can be easily modified to accept data in the full Jsonnet language. As such, Jsonnet is an example of a templating language, but one specifically designed for JSON data and less error-prone than other techniques.
“Jsonnet” is a portmanteau of JSON and sonnet. We chose that name to convey that data expressed in Jsonnet is easier to write and maintain because it is more elegant and concise, like a poem. This is not just due to syntactic niceties like comments and permissive quotes/commas, but because Jsonnet has all the modern multi-paradigm programming language conveniences needed to manage complexity. One key benefit is the ability to use Jsonnet's mixin and import features to write modular configuration template libraries, allowing the creation of domain-specific configuration languages for particular applications.
Most configuration languages are created ad-hoc for the needs of a given application, accruing features over time and becoming unwieldy. From day one, Jsonnet was designed as a coherent programming language, benefitting from both academic techniques and our experience implementing production languages. Unlike most configuration languages, Jsonnet has a full operational semantics, ensuring matching behavior from third party implementations as well as mathematical analysis. It is a very small and carefully chosen extension to JSON that can express both object-oriented and declarative styles. More importantly, unlike regular programming languages, Jsonnet is hermetic:  Its evaluation is independent of any implicit environmental factors, ensuring that high level configuration will resolve to the same thing every time.
Jsonnet is open source. It’s currently available as a library with C and Python bindings, and also as a command line utility. A real-world example configuration can be found on the website, where 217 lines (9.7kB) of Jsonnet expand into 740 lines (25kB) of configuration for other tools. Learn more about Jsonnet by reading the tutorial and experimenting with our javascript demo!


by Dave Cunningham, New York Technical Infrastructure team

How to format Python code without really trying

Monday, March 30, 2015

Years of writing and maintaining Python code have taught us the value of automated tools for code formatting, but the existing ones didn’t quite do what we wanted. In the best traditions of the open source community, it was time to write yet another Python formatter.

YAPF takes a different approach to formatting Python code: it reformats the entire program, not just individual lines or constructs that violate a style guide rule. The ultimate goal is to let  engineers focus on the bigger picture and not worry about the formatting. The end result should look the same as if an engineer had worried about the formatting.

You can run YAPF on the entire program or just a part of the program. It’s also possible to flag certain parts of a program which YAPF shouldn’t alter, which is useful for generated files or sections with large literals.

Consider this horribly-formatted code:

x = {  'a':37,'b':42,

'c':927}

y = 'hello ''world'
z = 'hello '+'world'
a = 'hello {}'.format('world')
class foo  (     object  ):
 def f    (self   ):
   return       \
37*-+2
 def g(self, x,y=42):
     return y
def f  (   a ) :
 return      37+-+a[42-x :  y**3]

YAPF reformats this into something much more consistent and readable:

x = {'a': 37, 'b': 42, 'c': 927}

y = 'hello ' 'world'
z = 'hello ' + 'world'
a = 'hello {}'.format('world')


class foo(object):
   def f(self):
       return 37 * -+2

   def g(self, x, y=42):
       return y


def f(a):
   return 37 + -+a[42 - x:y ** 3]

Head to YAPF's GitHub page for more information on how to use it, and take a look at YAPF’s own source code to see a much larger example of the output it produces.

by Bill Wendling, YouTube Code Health Team

GSoC students create a Google Compute Engine interface to CloudStack

Wednesday, July 23, 2014

Today on the Open Source blog we have guest writer Sebastien Goasguen, an avid open source contributor and member of the Apache Software Foundation. Below, Sebastien highlights the significant contributions that two Google Summer of Code students have made to Apache CloudStack.

In December 2013, Google announced the General Availability (GA) of the public cloud, Google Compute Engine (GCE).  Apache CloudStack now has a brand new GCE compatible interface (Gstack) which allows users to take advantage of the GCE clients (i.e gcloud and gcutil) to access their CloudStack cloud. This interface was made possible through the Google Summer of Code (GSoC) program.

In the summer of 2013, Ian Duffy, a student from Dublin City University, participated in GSoC through the Apache Software Foundation and worked on a LDAP plugin to CloudStack. He did such a great job that he finished early and was made an Apache CloudStack committer. Since he finished his primary GSoC project so early, I encouraged him to take on another! He brought in a friend for the ride — Darren Brogan, another student at Dublin City University. Together they worked on the GCE interface to CloudStack and even learned Python in doing so.

Both Ian and Darren remained engaged with the CloudStack community and as their third year project in University, they successfully developed an Amazon EC2 interface to CloudStack. Since he enjoyed his experience so much, Darren also applied to the GSoC 2014 program and proposed to revisit Gstack, improve it, extend the unit tests, and make it compatible with the GCE v1 API. He is making excellent progress so far and we are all excited to see the results.

Technically, Gstack is a Python Flask application that provides a REST API compatible with the GCE API and forwards the requests to the corresponding CloudStack API. The source is available on GitHub and the binary is downloadable via PyPi.

Installation and Configuration of Gstack

Are you interested in using Gstack? Check out the full documentation. To get a taste for things, you can grab the binary package from Pypi using pip in one single command.

        pip install gstack

Or if you plan to explore the source and work on it, you can clone the repository and install it by hand. Pull requests are of course welcome.

       git clone https://github.com/NOPping/gstack.git
   
sudo python./setup.py install

Both of these installation methods will install a gstack and a gstack-configure binary in your path. Before running Gstack you must configure it. To do so run:

   gstack-configure

And enter your configuration information when prompted. You will need to specify the host and port where you want gstack to run on, as well as the CloudStack endpoint that you want gstack to forward the requests to. In the example below we use the exoscale cloud:

   $ gstack-configure

   gstack bind address [0.0.0.0]: localhost

   gstack bind port [5000]:
   
Cloudstack host [localhost]: api.exoscale.ch
   
Cloudstack port [8080]: 443
   
Cloudstack protocol [http]: https

   Cloudstack path [/client/api]: /compute

The information will be stored in a configuration file available at ~/.gstack/gstack.conf:

   $ cat ~/.gstack/gstack.conf 

   PATH = 'compute/v1/projects/'

   GSTACK_BIND_ADDRESS = 'localhost'
   
GSTACK_PORT = '5000'
   
CLOUDSTACK_HOST = 'api.exoscale.ch'
   
CLOUDSTACK_PORT = '443'
   
CLOUDSTACK_PROTOCOL = 'https'
 
   CLOUDSTACK_PATH = '/compute'

You are now ready to start Gstack in the foreground with:

   gstack

That's all there is to running Gstack. You can then use gcutil to send requests to gstack which will forward them to a CloudStack endpoint.  Although it is still a work in progress, it is now compatible with GCE GA v1.0 API. It provides a solid base to start working on hybrid solutions between GCE public cloud and a CloudStack based private cloud.

GSoC has been a terrific opportunity for all of us at Apache. Darren and Ian both learned how to work with an open source community and ultimately became an integral part of it. They learned tools like JIRA, git, and Review Board and gained confidence working publicly on mailing lists. Their work on Gstack and EC2stack is certainly of high value to CloudStack and could eventually become the base for interesting products that will use hybrid clouds.

By Sebastien Goasguen, Senior Open Source Architect, Citrix and Apache Software Foundation member

Babbage: easily encode or decode data with a click

Monday, April 7, 2014

Engineers at Google deal with encoded data on a daily basis. It’s very common to handle files encoded in a variety of different formats. For example, email attachments are Base64 encoded and web requests are URL encoded. Custom encodings bring another level of complication especially when different codings are chained together. Over time this constant need to encode / decode data left me with a large, unmanageable collection of scripts. This collection was simply not scaling, so I set off to create a better solution. We needed something easy to use and extensible enough to serve our future needs.

Today, I’m happy to introduce Babbage, an open source tool for manipulating data in many different formats. With Babbage you can easily decode or encode data with just a click. Paste in “SGVsbG8h”, select base 64 decode and you get “Hello!”. You can paste in text to process with plugins (which are an easy way to transform data). Babbage comes with a basic set of plugins to cover simple encodings and obfuscation techniques such as Base64, URL encoding, XOR and others. If you have something a bit more complicated, you can chain multiple plugins together. Babbage is open source and written so that anyone can create their own collection of plugins with libraries already in use.

Babbage was written in Python and JavaScript with Google Closure on top of Google App Engine. The full source code is available on GitHub. Develop something cool and share it with the world! We are always looking for new contributions — feel free to contact us on our developers discussion group.

By Tom Fitzgerald, Google Engineering

Teaching the next generation to code: Young Coders at PyTennessee 2014

Wednesday, March 12, 2014

The Google Open Source team recently sponsored the PyTennessee conference in Nashville. Adam Fletcher, an Engineer at Google and today's guest blogger, volunteered at the conference and helped introduce Python to an enthusiastic group of students. 

On February 23rd & 24th the first PyTennessee took place in Nashville, Tennessee, and brought hundreds of pythonistas from all over the nation to learn about a diverse set of Python-related topics. On Saturday the 24th, PyTennessee ran a Young Coders event, based on a similar event that took place at the 2013 US PyCon. Google was proud to sponsor this event, providing funding for the Raspberry Pi computers the coders used throughout the day.
 Mayor of Nashville, Karl Dean, with the students

The Young Coders event introduced 25 new programmers, aged 12-18, to the world of Python by providing each student with a Raspberry Pi running Linux and a day of instruction in the Python programming language. Students were taught about the basic data types and control flow in Python in the morning and then spent the afternoon making and modifying games. When the event wrapped up the students got to take home their Raspberry Pi computers to continue their programming exploration at home. Additionally, the students each got a copy of Python For Kids, an excellent introductory book.
Raspberry Pi, the compact computer the students used to learn Python

Earlier in the day the Mayor of Nashville, Karl Dean, stopped by to learn about the Young Coders event and to talk to the students. Mayor Dean was excited about Nashville as a technology center; Nashville is one of the cities being evaluated for Google Fiber, and Google has selected Nashville as one of the Google for Entrepreneurs Tech Hub Network cities.

Later, the students used their newfound Python knowledge to modify various games. Students altered the startup screen, changed the frame rates, modified the fundamental rules, and made other fun changes to games written in the PyGame framework.
Two students hard at work

Katie Cunningham (right) with two Young Coders

The Young Coders event would not have been successful without its excellent instructor, Katie Cunningham. Big thanks to her and to the entire PyTennessee team for for organizing such a wonderful event, and for providing the space to help train the next generation of computer scientists!

By Adam Fletcher, Google Site Reliability Engineer

Oppia: a tool for interactive learning

Wednesday, February 26, 2014

"I hear and I forget; I see and I remember; I do and I understand." — Confucius

Lots of online education is delivered using video and text. However, opportunities for learners to do things and get feedback on their work are also important — after all, one does not learn to play the piano by watching videos of many virtuoso performances.

We're excited to announce Oppia, a project that aims to make it easy for anyone to create online interactive activities, called 'explorations', that others can learn from. Oppia does this by modeling a mentor who poses questions for the learner to answer. Based on the learner's responses, the mentor decides what question to ask next, what feedback to give, whether to delve deeper, or whether to proceed to something new. You can think of this as a smart feedback system that tries to “teach a person to fish”, instead of simply revealing the correct answer or marking the submitted answer as wrong. If you’d like to get an idea of what these explorations are like, you can try out some examples at www.oppia.org.

The Oppia learning interface. 


  The Oppia editing interface.
                               

A unique feature of Oppia is that it allows multiple people from around the world to create and collaborate on explorations. They can do this through a web interface — no programming required.

Oppia gathers data on how learners interact with it, making it easy for exploration authors to spot and fix shortcomings in an exploration. They would do this by logging in, finding an answer that many learners are giving but which the system is not responding to adequately, and creating a new learning path for it, based on what they would actually say if they were interacting in-person with the learner. Oppia can then give this feedback to future learners.
A video by Yana Malysheva, one of the developers, explaining how Oppia works.
                       
Oppia knows how to deal with numeric, text, and multiple choice inputs, as well as some more specialized types such as a clickable map and a code evaluator. We've also built an extensible framework that lets developers extend the range of input types that Oppia can understand.

The explorations created on an Oppia server can be embedded in any web page. These embeddings can refer to a particular version, so that further changes to the canonical version of the exploration do not automatically appear in the embedded one. This feature allows learning experiences that have been created using Oppia explorations to retain their integrity over time.

Oppia is built using Python and AngularJS on top of Google App Engine. You can download the source code; we hope you find it useful! Please feel free to contribute suggestions through our issue tracker, or contact us at our developers discussion group. We actively welcome new contributors, so if you would like to help out, please don't hesitate to get in touch.

By Sean Lip, Software Engineer, Google Research

Students add to SymPy

Monday, December 12, 2011



SymPy is a computer algebra system (CAS) written in pure Python. The core allows basic manipulation of expressions (like differentiation or expansion) and it contains many modules for common tasks (limits, integrals, differential equations, series, matrices, quantum physics, geometry, plotting, and code generation).

SymPy has participated in the Google Summer of Code program in previous years under the umbrellas of Python Software Foundation, Portland State University, and the Space Telescope Science Institute, where we were very successful. In fact, several of our core developers, including four of the mentors from this year, started working with SymPy as Google Summer of Code students. This was our first year participating as a standalone organization, and we would like to share our experience.

As part of the application process we required each student to submit a patch (as a GitHub pull request) that had to be reviewed and accepted. This allowed us to see that each applicant knew how to use git as well as communicate effectively during the review process.This also encouraged only serious applicants to apply. We had over 10 mentors available and we ended up with 9 students, all of whom were successful at final evaluations.

Tom implemented an algorithm for computing symbolic definite integrals that uses so-called Meijer G-functions. This is the state-of-the-art algorithm for computing definite integrals, and indeed the results of his project are very impressive. This project has pushed SymPy forward a long way to becoming the strongest open source computer algebra system with respect to symbolic definite integration.

Vladimir Peric - Porting to Python 3, mentored by Ronan Lamy
Vladimir ported SymPy to work on Python 3 and ported all testing infrastructure so that SymPy gets regularly tested in Python 2.x, 3.2 and PyPy. Thanks to Vladimir’s work, the next version of SymPy, 0.7.2, which will hopefully be released later this year, will work in both Python 2 and Python 3, and it may support PyPy as well.

Gilbert Gede - PyDy, mentored by Luke Peterson
Gilbert implemented a physics module to assist in generating symbolic equations of motion for complex multibody systems using Kane's Method. He expanded on the code written by his mentor, Luke, in 2009, and the module can now generate equations of motion for a bicycle. Gilbert also wrote very thorough documentation both for the Kane’s Method and the module in SymPy.

Tomo has greatly improved the quantum mechanics module by implementing position/momentum representations for operators and eigenstates in various coordinate systems (including cartesian, cylindrical, and spherical) that allows you to easily represent many of the "textbook" quantum mechanics systems, including particle in a box, simple harmonic oscillator, hydrogen atom, etc.

Saptarshi Mandal - Combinatorics package for Sympy, mentored by Christian Muise
Saptarshi’s project was to mimic the various capabilities of Combinatorica, a Mathematica package for combinatorics. Most of the functionality involving elementary combinatorial objects such as Permutations, Partitions, Subsets, Gray codes and Prufer codes are complete.

Sherjil Ozair - Symbolic Linear Algebra, mentored by Vinzent Steinberg
Sherjil improved the speed of the linear algebra module by using efficient coefficient types for values of entries of matrices. Previously, SymPy used generic expressions in this place, which slowed down computations considerably and caused trouble with solving of the zero equivalence problem. He also implemented sparse matrix representation and unified the API with dense matrices. In addition, Sherjil also added a few linear algebra related algorithms (e.g. Cholesky decomposition).

Matthew improved the statistics module to use symbolics and introduced a Random Variable type, with support for finite, continuous, and multivariable normal random variables. With these you can symbolically compute things like probabilities of a given condition, conditional spaces, and expectation values. As a side consequence of this project, he also improved some of our Sets classes and implemented a MatrixExpr class, which allows you to compute with matrices symbolically, including computing with block matrices.

Sean was working on the quantum mechanics module and has implemented symbolic Clebsch-Gordan coefficients, Wigner D function, and related mathematical concepts that are used very often in quantum physics when dealing with angular momentum and then the necessary classes to support coupled spin algebra.

Jeremias Yehdegho - Implementing F5, mentored by Mateusz Paprocki
Jeremias worked on implementing algorithms related to Groebner bases. Groebner bases are a useful tool in many areas of computer algebra. He implemented the F5B algorithm, which is an improved version of the classical Buchberger’s algorithm that was previously implemented in SymPy, and an algorithm for converting Groebner bases between different orders of monomials and worked on applications of Groebner bases. This allowed for handling problems of much larger size in SymPy.

The full report can be found here, where each student wrote a wiki page about their experience during the summer and you can also find their blogs and links to applications. Each student was required to blog about their progress each week and all blogs were synchronized at planet.sympy.org.

In previous years, there was usually one student from each summer who became a regular contributor and also a mentor for the next year. It has been a rewarding experience for the whole SymPy community.

By Ondřej Čertík, Aaron Meurer and Mateusz Paprocki, SymPy Google Summer of Code Mentors
.
close